You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2019/12/12 13:42:00 UTC

[jira] [Resolved] (FLINK-15082) Mesos App Master does not respect taskmanager.memory.total-process.size

     [ https://issues.apache.org/jira/browse/FLINK-15082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann resolved FLINK-15082.
-----------------------------------
    Resolution: Fixed

Fixed via

master:
b8f3e3a77d23076aff7378820056b3f8d43c55d1
792c6749975350eb16826e51cc0763dfbc3eb20a

1.10.0:
952a880b70cd13fe22ca8397611168e9e55d4823
d9eefeb0baa4c78202c116084bc39fddc322cc52

> Mesos App Master does not respect taskmanager.memory.total-process.size
> -----------------------------------------------------------------------
>
>                 Key: FLINK-15082
>                 URL: https://issues.apache.org/jira/browse/FLINK-15082
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Gary Yao
>            Assignee: Andrey Zagrebin
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Description*
>  When the Mesos App Master is started with {{taskmanager.memory.total-process.size}}, [the value is not respected|https://github.com/apache/flink/blob/d08beaa3255b3df96afe35f17e257df31a0d71ed/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosTaskManagerParameters.java#L339]. 
> One can reproduce this when starting the App Master with the command below:
> {noformat}
> /bin/mesos-appmaster.sh \ 
> -Dtaskmanager.memory.total-process.size=2048m \
> -Djobmanager.heap.size=2048m \
> ...
> {noformat}
> The ClusterEntryPoint will fail with an exception (see below). The reason is that the default value of {{mesos.resourcemanager.tasks.mem}} will be taken as the total process memory size (1024 mb).
> {noformat}
> org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint MesosSessionClusterEntrypoint.
>         at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
>         at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
>         at org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:126)
> Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
>         at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
>         at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
>         at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>         at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>         at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
>         ... 2 more
> Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of configured Framework Heap Memory (134217728 bytes), Framework Off-Heap Memory (134217728 bytes), Task Off-Heap Memory (0 bytes), Managed Memory (719407031 bytes) and Shuffle Memory (80530638 bytes) exceed configured Total Flink Memory (805306368 bytes).
>         at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveInternalMemoryFromTotalFlinkMemory(TaskExecutorResourceUtils.java:273)
>         at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithTotalProcessMemory(TaskExecutorResourceUtils.java:210)
>         at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:108)
>         at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:94)
>         at org.apache.flink.mesos.runtime.clusterframework.MesosTaskManagerParameters.create(MesosTaskManagerParameters.java:341)
>         at org.apache.flink.mesos.util.MesosUtils.createTmParameters(MesosUtils.java:109)
>         at org.apache.flink.mesos.runtime.clusterframework.MesosResourceManagerFactory.createActiveResourceManager(MesosResourceManagerFactory.java:80)
>         at org.apache.flink.runtime.resourcemanager.ActiveResourceManagerFactory.createResourceManager(ActiveResourceManagerFactory.java:58)
>         at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:170)
>         ... 9 more
> {noformat}
> *Expected Behavior*
>  * If taskmanager.memory.total-process.size and mesos.resourcemanager.tasks.mem are both set and differ in their values, an exception should be thrown
>  * If only taskmanager.memory.total-process.size is set and mesos.resourcemanager.tasks.mem is not set, then the value configured by the former should be respected
>  * If only mesos.resourcemanager.tasks.mem is set and taskmanager.memory.total-process.size is not set, then the value configured by the former should be respected



--
This message was sent by Atlassian Jira
(v8.3.4#803005)