You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (JIRA)" <ji...@apache.org> on 2019/07/30 12:33:00 UTC

[jira] [Comment Edited] (FLINK-13477) Containerized TaskManager killed because of lack of memory overhead

    [ https://issues.apache.org/jira/browse/FLINK-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896079#comment-16896079 ] 

Xintong Song edited comment on FLINK-13477 at 7/30/19 12:32 PM:
----------------------------------------------------------------

Thanks for bring this up, [~b.hanotte].

The community is also aware of this problem, and is preparing a new FLIP for improving the whole Flink resource management story, including the JVM settings you mentioned.

Once we have the first version of the FLIP draft, we will post it in the dev mailing list. For the moment, please find some discussions [here|https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit#] and welcome to join the discussion.


was (Author: xintongsong):
Thanks for bring this up, [~b.hanotte].

The community is also aware of this problem, and is preparing a new FLIP for improving the whole Flink resource management story, including the JVM settings you mentioned.

Once we have the first version of the FLIP draft, we will post it in the dev mailing list. For the moment, please find some discussions [here|[https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit#|https://docs.google.com/document/d/1o4KvyyXsQMGUastfPin3ZWeUXWsJgoL7piqp1fFYJvA/edit]] and welcome to join the discussion.

> Containerized TaskManager killed because of lack of memory overhead
> -------------------------------------------------------------------
>
>                 Key: FLINK-13477
>                 URL: https://issues.apache.org/jira/browse/FLINK-13477
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Mesos
>    Affects Versions: 1.9.0
>            Reporter: Benoit Hanotte
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the `-XX:MaxDirectMemorySize` parameter is set as:
> `MaxDirectMemorySize = containerMemoryMB - heapSizeMB`
> (see [https://github.com/apache/flink/blob/7fec4392b21b07c69ba15ea554731886f181609e/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParameters.java#L162])
> However as explained at
>  https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html,
> `MaxDirectMemorySize` only sets the maximum amount of memory that can be
> used for direct buffers, thus the amount of off-heap memory used can be
> greater than that value, leading to the container being killed by Mesos
> or Yarn as it exceeds the allocated memory.
> In addition, users might want to allocate off-heap memory through native
> code, in which case they will want to keep some of the container memory
> free and unallocated by Flink.
> To solve this issue, we currently set the following parameter:
> {code:java}
> -Dcontainerized.taskmanager.env.FLINK_ENV_JAVA_OPTS='-XX:MaxDirectMemorySize=600m'
> {code}
> which overrides the value that Flink picks (744M in this case) with a lower one to keep some overhead memory in the TaskManager containers. However this is an "ugly" hack as it goes around the clever memory allocation that Flink performs and allows to bypass the sanity checks done in `ContaineredTaskManagerParameters`.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)