You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by rsltrifork <gi...@git.apache.org> on 2018/07/09 12:25:30 UTC

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

GitHub user rsltrifork opened a pull request:

    https://github.com/apache/flink/pull/6288

    [FLINK-9703] [flink-mesos] Expose Prometheus port for TM through Mesos

    ## What is the purpose of the change
    
    To enable Task Managers, when set up by mesos, to expose a port for Prometheus monitoring.
    
    ## Brief change log
    
      - If "metrics.reporter.prom.port' is configured, make sure mesos assigns a port for it.
    
    ## Verifying this change
    
    *(Please pick either of the following options)*
    
    This change is already covered by existing tests, such as *(please describe tests)*.
    
    *(or)*
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: yes
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Safetrack/flink FLINK-9703

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6288
    
----
commit ca55557651ca9164d3611d2dfc3f1ce934a6f9ae
Author: Rune Skou Larsen <rs...@...>
Date:   2018-07-09T11:54:51Z

    [FLINK-9703] Expose Prometheus port for TM through Mesos

----


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/6288


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r201657551
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    --- End diff --
    
    ah,I think i misunderstood earlier what the code does.
    
    So what we _are_ doing is scanning the taskmanager config for port config options, gather the keys, and later these are overridden by mesos with some port?
    
    If so, is there any way to restrict ports to certain ranges? For the taskmanager port this isn't important as for all intents and purposes it is only used internally anyway, but for reporters this port is very much user-facing.
    In case of the PrometheusReporter this would imply that prometheus is always started _after_ the TM, as we can't determine the port beforehand. Is that correct?


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r200987982
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -85,11 +85,11 @@
     	 * @param taskID the taskID for this worker.
     	 */
     	public LaunchableMesosWorker(
    -			MesosArtifactResolver resolver,
    -			MesosTaskManagerParameters params,
    -			ContainerSpecification containerSpec,
    -			Protos.TaskID taskID,
    -			MesosConfiguration mesosConfiguration) {
    +		MesosArtifactResolver resolver,
    +		MesosTaskManagerParameters params,
    --- End diff --
    
    please try to avoid formatting changes.


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r201331084
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    --- End diff --
    
    Is this allowed to contain port ranges?


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r202062001
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    +		containerSpec.getDynamicConfiguration().keySet().stream()
    +			.filter(key -> key.endsWith(".port") || key.endsWith(".ports"))  // This matches property naming convention
    --- End diff --
    
    You have to check the configured value as well, I know of at least one case where negative ports are used to disable features (and we wouldn't want enable these again).
    Are there any use-cases where one might _not_ want mesos to provide the port?


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r201331518
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    +		containerSpec.getDynamicConfiguration().keySet().stream()
    +			.filter(key -> key.endsWith(".port"))  // This matches property naming convention
    --- End diff --
    
    there are at least 2 instances where the key ends with "port**s**".
    https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#query-proxy-ports
    https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#query-server-ports


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by rsltrifork <gi...@git.apache.org>.
Github user rsltrifork commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r201645393
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    --- End diff --
    
    Yes, ranges are allowed as value. In fact *anything* is allowed as value, because the implementation does not look at which value is assigned to a port key. It always overrules your assigned port value and replaces it with a single port provided by Mesos.
    
    My understanding is, that port ranges in the first place are only used for one purpose: To allow picking a single available port from within the range. For example when running multiple TMs on the same machine, where ports would otherwise clash. But this complexity is not needed when spawning TMs with Mesos, because Mesos will always choose and assign a port, which is guarenteed to be available. Therefore, it's safe to only request a single port for each configured portKey.


---

[GitHub] flink issue #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port for TM ...

Posted by zentol <gi...@git.apache.org>.
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/6288
  
    This doesn't really work; the configured name for the reporter (ìn this case `prom`) is arbitrary and can be chosen by the user. As such this approach will fail the moment users choose a different name. 
    
    It would also be good if we could figure out a more general solution to the problem, as this will affect other components now/in the future.


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by rsltrifork <gi...@git.apache.org>.
Github user rsltrifork commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r201943310
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    --- End diff --
    
    > So what we are doing is scanning the taskmanager config for port config options, gather the keys, and later these are overridden by mesos with some port?
    
    Yes.
    
    > In case of the PrometheusReporter this would imply that prometheus is always started after 
    > the TM, as we can't determine the port beforehand. Is that correct?
    
    No. Prometheus supports dynamicly reloading it's scrape configuration. So the other part of the monitoring solution is our Prometheus script, which periodically querries mesos to maintain an up-to-date list of active TM prometheus ports to scrape from. We're working with Mesosphere to open source this Prometheus scripting in DC/OS, but before that makes sense, we need Flink to let Mesos assign the scrape port to the TM container - hence this PR, which I hope will be quickly merged.


---

[GitHub] flink pull request #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port ...

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/6288#discussion_r202171601
  
    --- Diff: flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/LaunchableMesosWorker.java ---
    @@ -332,6 +334,22 @@ public String toString() {
     		return taskInfo.build();
     	}
     
    +	/**
    +	 * Get port keys representing the TM's configured endpoints. This includes mandatory TM endpoints such as
    +	 * data and rpc as well as optionally configured endpoints for services such as prometheus reporter
    +	 *
    +	 * @return A deterministicly ordered Set of port keys to expose from the TM container
    +	 */
    +	private Set<String> getPortKeys() {
    +		LinkedHashSet<String> tmPortKeys = new LinkedHashSet<>(Arrays.asList(TM_PORT_KEYS));
    +		containerSpec.getDynamicConfiguration().keySet().stream()
    +			.filter(key -> key.endsWith(".port") || key.endsWith(".ports"))  // This matches property naming convention
    --- End diff --
    
    I agree that simply taking all configuration values which end with `port` and `ports` is problematic. What about configuration values whose value is needed, e.g. remote ports, and should not be overwritten. For example the `jobmanager.rpc.port` is one of these configuration values.
    
    I would rather prefer a mesos specific configuration value which we can use to define whose ports need to be dynamically assigned. For example `mesos.dynamic-port-assignment: "metrics.prom.port, taskmanager.rpc.port, taskmanager.data.port"`. What do you think?


---

[GitHub] flink issue #6288: [FLINK-9703] [flink-mesos] Expose Prometheus port for TM ...

Posted by rsltrifork <gi...@git.apache.org>.
Github user rsltrifork commented on the issue:

    https://github.com/apache/flink/pull/6288
  
    Thank you for the review, zentol. I've made a more generic solution now for exposing TM ports through Mesos - please have a look again.


---