You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Steven Zhen Wu (Jira)" <ji...@apache.org> on 2020/06/17 18:30:00 UTC
[jira] [Created] (FLINK-18350) [1.11.0] jobmanager complains
`taskmanager.memory.process.size` missing
Steven Zhen Wu created FLINK-18350:
--------------------------------------
Summary: [1.11.0] jobmanager complains `taskmanager.memory.process.size` missing
Key: FLINK-18350
URL: https://issues.apache.org/jira/browse/FLINK-18350
Project: Flink
Issue Type: Bug
Components: Runtime / Configuration
Affects Versions: 1.11.0
Reporter: Steven Zhen Wu
Saw this failure in jobmanager startup. I know the exception said that `taskmanager.memory.process.size` missing. We set it at taskmanager side in `flink-conf.yaml`. But I am wondering why is this required by jobmanager for session cluster mode. When taskmanager registering with jobmanager, it reports the resources (like CPU, memory etc.).
{code:java}
2020-06-17 18:06:25,079 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [main] - Could not start cluster entrypoint TitusSessionClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint TitusSessionClusterEntrypoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:516)
at com.netflix.spaas.runtime.TitusSessionClusterEntrypoint.main(TitusSessionClusterEntrypoint.java:103)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:255)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:216)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
... 2 more
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Cannot read memory size from config option 'taskmanager.memory.process.size'.
at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:234)
at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:100)
at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:79)
at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:109)
at org.apache.flink.runtime.clusterframework.TaskExecutorProcessSpecBuilder.build(TaskExecutorProcessSpecBuilder.java:58)
at org.apache.flink.runtime.resourcemanager.WorkerResourceSpecFactory.workerResourceSpecFromConfigAndCpu(WorkerResourceSpecFactory.java:37)
at com.netflix.spaas.runtime.resourcemanager.TitusWorkerResourceSpecFactory.createDefaultWorkerResourceSpec(TitusWorkerResourceSpecFactory.java:17)
at org.apache.flink.runtime.resourcemanager.ResourceManagerRuntimeServicesConfiguration.fromConfiguration(ResourceManagerRuntimeServicesConfiguration.java:67)
at com.netflix.spaas.runtime.resourcemanager.TitusResourceManagerFactory.createResourceManager(TitusResourceManagerFactory.java:53)
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:167)
... 9 more
Caused by: java.lang.IllegalArgumentException: Could not parse value '7500}' for key 'taskmanager.memory.process.size'.
at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:753)
at org.apache.flink.configuration.Configuration.get(Configuration.java:738)
at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:232)
... 18 more
Caused by: java.lang.IllegalArgumentException: Memory size unit '}' does not match any of the recognized units: (b | bytes) / (k | kb | kibibytes) / (m | mb | mebibytes) / (g | gb | gibibytes) / (t | tb | tebibytes)
at org.apache.flink.configuration.MemorySize.parseUnit(MemorySize.java:331)
at org.apache.flink.configuration.MemorySize.parseBytes(MemorySize.java:306)
at org.apache.flink.configuration.MemorySize.parse(MemorySize.java:247)
at org.apache.flink.configuration.Configuration.convertToMemorySize(Configuration.java:951)
at org.apache.flink.configuration.Configuration.convertValue(Configuration.java:885)
at org.apache.flink.configuration.Configuration.lambda$getOptional$2(Configuration.java:750)
at java.util.Optional.map(Optional.java:215)
at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:750)
... 20 more
{code}
We extend from WorkerResourceSpecFactory similar to KubernetesWorkerResourceSpecFactory.
{code:java}
public class TitusWorkerResourceSpecFactory extends WorkerResourceSpecFactory {
public static final TitusWorkerResourceSpecFactory INSTANCE =
new TitusWorkerResourceSpecFactory();
@Override
public WorkerResourceSpec createDefaultWorkerResourceSpec(Configuration configuration) {
return workerResourceSpecFromConfigAndCpu(configuration, getDefaultCpus(configuration));
}
@VisibleForTesting
static CPUResource getDefaultCpus(Configuration configuration) {
double fallback = Double.valueOf(System.getenv("TITUS_NUM_CPU"));
return TaskExecutorProcessUtils.getCpuCoresWithFallback(configuration, fallback);
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)