You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (JIRA)" <ji...@apache.org> on 2018/06/18 15:20:00 UTC

[jira] [Updated] (FLINK-9583) Wrong number of TaskManagers' slots after recovery.

     [ https://issues.apache.org/jira/browse/FLINK-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann updated FLINK-9583:
---------------------------------
    Fix Version/s: 1.6.0

> Wrong number of TaskManagers' slots after recovery.
> ---------------------------------------------------
>
>                 Key: FLINK-9583
>                 URL: https://issues.apache.org/jira/browse/FLINK-9583
>             Project: Flink
>          Issue Type: Bug
>          Components: ResourceManager
>    Affects Versions: 1.5.0
>         Environment: Flink 1.5.0 on YARN with the default execution mode.
>            Reporter: Truong Duc Kien
>            Priority: Major
>             Fix For: 1.6.0
>
>         Attachments: jm.log
>
>
> We started a job with 120 slots, using a FixedDelayRestart strategy with the delay of 1 minutes.
> During recovery, some but not all Slots were released.
> When the job restarts again, Flink requests a new batch of slots.
> The total number of slots is now 193, larger than the configured amount, but the excess slots are never released.
>  
> This bug does not happen with legacy mode. I've attach the job manager log.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)