You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/03 06:28:00 UTC

[jira] [Commented] (FLINK-10884) Flink on yarn TM container will be killed by nodemanager because of the exceeded physical memory.

    [ https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16706723#comment-16706723 ] 

ASF GitHub Bot commented on FLINK-10884:
----------------------------------------

zhijiangW commented on a change in pull request #7185: [FLINK-10884] [yarn/mesos]  adjust  container memory param  to set a safe margin from offheap memory
URL: https://github.com/apache/flink/pull/7185#discussion_r238154920
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParameters.java
 ##########
 @@ -158,8 +158,10 @@ public static ContaineredTaskManagerParameters create(
 
 		// (2) split the remaining Java memory between heap and off-heap
 		final long heapSizeMB = TaskManagerServices.calculateHeapSizeMB(containerMemoryMB - cutoffMB, config);
-		// use the cut-off memory for off-heap (that was its intention)
-		final long offHeapSizeMB = containerMemoryMB - heapSizeMB;
+		// (3) try to compute the offHeapMemory from a safe margin
+		final long restMemoryMB = containerMemoryMB - heapSizeMB;
+		final long offHeapCutoffMemory = calculateOffHeapCutoffMB(config, restMemoryMB);
 
 Review comment:
   I agree that both ways can work well. If we introduce the parameter `containerized.offheap-cutoff-ratio`, do you think we should also introduce `containerized.offheap-cutoff-min` to keep the same behavior with previous parameters?
   
   I suggest naming the current `containerized.heap-cutoff-ratio` to `containerized.memory-cutoff-ratio` to integrate all the memory overhead issues for below reasons:
   
   1. Less parameters seem better sometimes, but not always. If you want to cut off 100 heap memory, and 200 off-heap memory, then you can cut off 300 memory directly. It does not matter and no control how the 300 memory are used by heap or off-heap.  And it is actually stolen by any memory usages as long as no exceeding the container physical memory.
   
   2. Minimum change for only refactor the existing parameter name.
   
   Of course I can also accept the separate parameters if you insist on it. :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Flink on yarn  TM container will be killed by nodemanager because of  the exceeded  physical memory.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-10884
>                 URL: https://issues.apache.org/jira/browse/FLINK-10884
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management, Core
>    Affects Versions: 1.5.5, 1.6.2, 1.7.0
>         Environment: version  : 1.6.2 
> module : flink on yarn
> centos  jdk1.8
> hadoop 2.7
>            Reporter: wgcn
>            Assignee: wgcn
>            Priority: Major
>              Labels: pull-request-available, yarn
>
> TM container will be killed by nodemanager because of  the exceeded  [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi] memory. I found the lanuch context   lanuching TM container  that  "container memory =   heap memory+ offHeapSizeMB"  at the class org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters   from line 160 to 166  I set a safety margin for the whole memory container using. For example  if the container  limit 3g  memory,  the sum memory that   "heap memory+ offHeapSizeMB"  is equal to  2.4g to prevent the container being killed.Do we have the [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC] solution  or I can commit my solution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)