You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2019/06/07 12:41:00 UTC

[jira] [Commented] (FLINK-12736) ResourceManager may release TM with allocated slots

    [ https://issues.apache.org/jira/browse/FLINK-12736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858589#comment-16858589 ] 

Till Rohrmann commented on FLINK-12736:
---------------------------------------

As a corollary, it could also happen that new partitions are stored on the TM if it can have allocated slots when the callback is being processed. I guess in order to properly solve this problem we would need something like a message counter between the RM and the TM. Only if the message counter is the same as before sending the partition check message, we can be sure that nothing has changed on the TM.

> ResourceManager may release TM with allocated slots
> ---------------------------------------------------
>
>                 Key: FLINK-12736
>                 URL: https://issues.apache.org/jira/browse/FLINK-12736
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.0
>            Reporter: Chesnay Schepler
>            Priority: Critical
>             Fix For: 1.9.0
>
>
> The {{ResourceManager}} looks out for TaskManagers that have not had any slots allocated on them for a while, as these could be released to safe resources. If such a TM is found the RM checks via an RPC call whether the TM still holds any partitions. If no partition is held then the TM is released.
> However, in the RPC callback no check is made whether the TM is actually _still_ idle. In the meantime a slot could've been allocated on the TM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)