You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhijiang (JIRA)" <ji...@apache.org> on 2018/11/21 03:48:03 UTC
[jira] [Comment Edited] (FLINK-10884) Flink on yarn TM container
will be killed by nodemanager because of the exceeded physical memory.
[ https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694177#comment-16694177 ]
zhijiang edited comment on FLINK-10884 at 11/21/18 3:47 AM:
------------------------------------------------------------
I just quickly reviewed the related codes. In my analysis:
In the process of {{_ContaineredTaskManagerParameters#create_ method,}}
{{"offHeapSizeMB = containerMemoryMB - heapSizeMB"}}
{{The _containerMemoryMB_}} is the container's total physical memory including _{{cutofff}}_ and the _{{heapSizeMB}}_ is not covered _{{cutoff}}_ during calculation, so the _{{offHeapSizeMB}}_ would cover _{{cutoff}}_ as a result.
In the _{{testOffHeapMemoryWithDefaultConfiguration}}_, the _{{networkBufMB}}_ is not covered _{{cutoff}}_ during calculation, so it should be added _{{cutoff}}_ factor to compare with the above _{{offHeapSizeMB}}_.
was (Author: zjwang):
I just quickly reviewed the related codes. In my analysis:
In the process of {{ContaineredTaskManagerParameters#create method,}}
{{offHeapSizeMB = containerMemoryMB - heapSizeMB }}
{{The }}{{containerMemoryMB}} is the container's total physical memory including {{cutofff}} and t{{he }}{{heapSizeMB}} is not covered {{cutoff}} during calculation, so the {{offHeapSizeMB}} would cover {{cutoff}} as a result.
In the test {{testOffHeapMemoryWithDefaultConfiguration}}, the {{networkBufMB}} is not covered {{cutoff}} during calculation, so it should be added {{cutoff}} factor to compare with the above {{offHeapSizeMB}}.
> Flink on yarn TM container will be killed by nodemanager because of the exceeded physical memory.
> ----------------------------------------------------------------------------------------------------
>
> Key: FLINK-10884
> URL: https://issues.apache.org/jira/browse/FLINK-10884
> Project: Flink
> Issue Type: Bug
> Components: Cluster Management, Core
> Affects Versions: 1.5.5, 1.6.2, 1.7.0
> Environment: version : 1.6.2
> module : flink on yarn
> centos jdk1.8
> hadoop 2.7
> Reporter: wgcn
> Assignee: wgcn
> Priority: Major
> Labels: yarn
>
> TM container will be killed by nodemanager because of the exceeded [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi] memory. I found the lanuch context lanuching TM container that "container memory = heap memory+ offHeapSizeMB" at the class org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters from line 160 to 166 I set a safety margin for the whole memory container using. For example if the container limit 3g memory, the sum memory that "heap memory+ offHeapSizeMB" is equal to 2.4g to prevent the container being killed.Do we have the [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC] solution or I can commit my solution
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)