You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Addison Higham (JIRA)" <ji...@apache.org> on 2017/09/13 21:10:01 UTC

[jira] [Commented] (FLINK-7615) Under mesos when using a role, TaskManagers fail to schedule

    [ https://issues.apache.org/jira/browse/FLINK-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165288#comment-16165288 ] 

Addison Higham commented on FLINK-7615:
---------------------------------------

I should mention,  looking at the code, this should also be a problem under flink 1.2.0 as well

> Under mesos when using a role, TaskManagers fail to schedule
> ------------------------------------------------------------
>
>                 Key: FLINK-7615
>                 URL: https://issues.apache.org/jira/browse/FLINK-7615
>             Project: Flink
>          Issue Type: Bug
>          Components: Mesos
>    Affects Versions: 1.3.2
>            Reporter: Addison Higham
>
> When `mesos.resourcemanager.framework.role` is specified, TaskManagers are unable to start. An error message is given that indicates that the request resources can be satisfied. I sadly lost the logs, but essentially it appears that an offer extend by mesos is accepted, but the request being made for resources under the default role (of `*`) but if the resources offered all exist under the role. 
> I believe this is likely to do with the fact that while the framework properly starts under the specified role (meaning it only gets offers of the specified role), it isn't making `Protos.Resource` objects with a role defined.
> This can be seen here: https://github.com/apache/flink/blob/release-1.3.2/flink-mesos/src/main/java/org/apache/flink/mesos/Utils.java#L72
> The mesos docs for the `Resource.Builder.setRole` (http://mesos.apache.org/api/latest/java/org/apache/mesos/Protos.Resource.Builder.html#setRole-java.lang.String-) allow for a role to be provided. (Note, this method is shown as deprecated for mesos 1.4.0, but for the current version flink uses of 1.0.1, this method is the only mechanism)
> I believe this should mostly be fixed by something like this:
> {code:java}
> /**
> 	 * Construct a scalar resource value.
> 	 */
> 	public static Protos.Resource scalar(String name, double value, Option<String> role) {
> 		Protos.Resource.Builder builder = Protos.Resource.newBuilder()
> 			.setName(name)
> 			.setType(Protos.Value.Type.SCALAR)
> 			.setScalar(Protos.Value.Scalar.newBuilder().setValue(value));
> 		if (role.isDefined()) {
> 			builder.setRole(role.get());
> 		}
> 		return builder.build();
> 	}
> {code}
> However, perhaps we want to consider upgrading to mesos 1.4.x that has the newer API for this (http://mesos.apache.org/api/latest/java/org/apache/mesos/Protos.Resource.ReservationInfo.Builder.html#setRole-java.lang.String-) 
> In looking at the other options for ReservationInfo, I don't see any current need to expose any of those parameters for configuration, but perhaps some FLIP-6 work could benefit.
> [~till.rohrmann] any thoughts? I can implement a fix as above against mesos 1.0.1, but figured I would get your input before submitting a patch for this



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)