You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/09 14:40:01 UTC

[jira] [Commented] (FLINK-7400) off-heap limits set to conservatively in cluster environments

    [ https://issues.apache.org/jira/browse/FLINK-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120012#comment-16120012 ] 

ASF GitHub Bot commented on FLINK-7400:
---------------------------------------

GitHub user NicoK opened a pull request:

    https://github.com/apache/flink/pull/4506

    [FLINK-7400][cluster] fix off-heap limits set to conservatively in cluster environments

    ## What is the purpose of the change
    
    Inside `ContaineredTaskManagerParameters`, since #3648, the `offHeapSize` is set to the amount of memory Flink will use off-heap which will be set as the value for `-XX:MaxDirectMemorySize` in various cases, e.g. YARN or Mesos. This does not account for any off-heap use by other components than Flink, e.g. RocksDB, other libraries, or the JVM itself.
    
    Please note that this affects at least all batch programs with the following options set (which do not make much sense for streaming):
    ```
    taskmanager.memory.off-heap=true
    taskmanager.memory.size=<any value>
    taskmanager.memory.preallocate=true
    ```
    If, instead, `taskmanager.memory.fraction` is used, programs may be safe due to https://issues.apache.org/jira/browse/FLINK-7401 but the actual additional buffer that we get from that may be too small, especially if RocksDB or other libraries using off-heap memory are used.
    
    This PR adds the `cutoff` from the `containerized.heap-cutoff-ratio`/`containerized.heap-cutoff-min` configuration parameters to `offHeapSize` as implied by the description of these two options.
    
    ## Brief change log
    
    - include the cut-off memory (removed from the container memory size for further calculations) into the off-heap part
    - add a unit test verifying the bug fix in a YARN environment
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
    
    - added `YARNSessionCapacitySchedulerITCase#perJobYarnClusterOffHeap()` test that validates that we have enough memory available and the bounds are not too strict
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes: memory calculations)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (JavaDocs)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NicoK/flink flink-7400

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4506.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4506
    
----
commit 60d40cde20686b4b1b2d15dc838b15ed0cd994cc
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-08-09T09:53:03Z

    [FLINK-7400][cluster] fix cut-off memory not used for off-heap reserve as intended
    
    + fix description of `containerized.heap-cutoff-ratio`

commit 4135a223288608444d324da333cfdd70117c796d
Author: Nico Kruber <ni...@data-artisans.com>
Date:   2017-08-09T14:16:31Z

    [FLINK-7400][yarn] add an integration test for yarn container memory restrictions using off-heap memory

----


> off-heap limits set to conservatively in cluster environments
> -------------------------------------------------------------
>
>                 Key: FLINK-7400
>                 URL: https://issues.apache.org/jira/browse/FLINK-7400
>             Project: Flink
>          Issue Type: Bug
>          Components: Cluster Management, Mesos, YARN
>    Affects Versions: 1.3.0, 1.3.1, 1.3.2
>            Reporter: Nico Kruber
>            Assignee: Nico Kruber
>
> Inside {{ContaineredTaskManagerParameters}}, since FLINK-6217, the {{offHeapSize}} is set to the amount of memory Flink will use off-heap which will be set as the value for {{-XX:MaxDirectMemorySize}} in various cases. This does not account for any off-heap use by other components than Flink, e.g. RocksDB, other libraries, or the JVM itself.
> We should add the {{cutoff}} from the {{CONTAINERIZED_HEAP_CUTOFF_RATIO}} configuration parameter to {{offHeapSize}} as implied by the description on what this parameter is there for.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)