You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/11/11 07:44:00 UTC

[jira] [Commented] (FLINK-10848) Flink's Yarn ResourceManager can allocate too many excess containers

    [ https://issues.apache.org/jira/browse/FLINK-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682768#comment-16682768 ] 

ASF GitHub Bot commented on FLINK-10848:
----------------------------------------

suez1224 opened a new pull request #7078: [FLINK-10848][YARN] properly remove YARN ContainerRequest upon container allocation success
URL: https://github.com/apache/flink/pull/7078
 
 
   ## What is the purpose of the change
   
   Properly remove YARN ContainerRequest upon container allocation success.
   
   
   ## Brief change log
   
     - add call to removeContainerRequest in YarnFlinkResourceManager
     - add call to removeContainerRequest in YarnResourceManager 
     - change unittests to verify,
   
   
   ## Verifying this change
   
   This change is already covered by existing tests.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): ( no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: ( no)
     - The serializers: ( no )
     - The runtime per-record code paths (performance sensitive): ( no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: ( no)
     - The S3 file system connector: ( no )
   
   ## Documentation
   
     - Does this pull request introduce a new feature? ( no)
     - If yes, how is the feature documented? (not applicable )
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Flink's Yarn ResourceManager can allocate too many excess containers
> --------------------------------------------------------------------
>
>                 Key: FLINK-10848
>                 URL: https://issues.apache.org/jira/browse/FLINK-10848
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.3.3, 1.4.2, 1.5.5, 1.6.2
>            Reporter: Shuyi Chen
>            Assignee: Shuyi Chen
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, both the YarnFlinkResourceManager and YarnResourceManager do not call removeContainerRequest() on container allocation success. Because the YARN AM-RM protocol is not a delta protocol (please see YARN-1902), AMRMClient will keep all ContainerRequests that are added and send them to RM.
> In production, we observe the following that verifies the theory: 16 containers are allocated and used upon cluster startup; when a TM is killed, 17 containers are allocated, 1 container is used, and 16 excess containers are returned; when another TM is killed, 18 containers are allocated, 1 container is used, and 17 excess containers are returned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)