You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "zhijiang (Jira)" <ji...@apache.org> on 2019/12/13 04:08:00 UTC

[jira] [Closed] (FLINK-14952) Yarn containers can exceed physical memory limits when using BoundedBlockingSubpartition.

     [ https://issues.apache.org/jira/browse/FLINK-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhijiang closed FLINK-14952.
----------------------------
    Release Note: 
This changes the option key and default value for the type of BoundedBlockingSubpartition in batch jobs. 

The previous key `taskmanager.network.bounded-blocking-subpartition-type` was changed to `taskmanager.network.blocking-shuffle.type` now. 

And the respective default option value was also changed from `auto` to `file` for avoiding yarn killing task manager container when memory usage of mmap exceeds some threshold.
      Resolution: Fixed

Fixed in master: 7600e8b9d4cb8fee928c9edc9d2483787dc10a3c

Fixed in release-1.10: b52efff51f6494c442e32181a5d6896feec4e990

> Yarn containers can exceed physical memory limits when using BoundedBlockingSubpartition.
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-14952
>                 URL: https://issues.apache.org/jira/browse/FLINK-14952
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Network
>    Affects Versions: 1.9.1
>            Reporter: Piotr Nowojski
>            Assignee: zhijiang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> As [reported by a user on the user mailing list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoGroup-SortMerger-performance-degradation-from-1-6-4-1-9-1-td31082.html], combination of using {{BoundedBlockingSubpartition}} with yarn containers can cause yarn container to exceed memory limits.
> {quote}2019-11-19 12:49:23,068 INFO org.apache.flink.yarn.YarnResourceManager - Closing TaskExecutor connection container_e42_1574076744505_9444_01_000004 because: Container [pid=42774,containerID=container_e42_1574076744505_9444_01_000004] is running beyond physical memory limits. Current usage: 12.0 GB of 12 GB physical memory used; 13.9 GB of 25.2 GB virtual memory used. Killing container.
> {quote}
> This is probably happening because memory usage of mmap is not capped and not accounted by configured memory limits, however yarn is tracking this memory usage and once Flink exceeds some threshold, container is being killed.
> Workaround is to overrule default value and force Flink to not user mmap, by setting a secret (🤫) config option:
> {noformat}
> taskmanager.network.bounded-blocking-subpartition-type: file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)