You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2020/05/08 02:54:00 UTC

[jira] [Commented] (FLINK-17560) No Slots available exception in Apache Flink Job Manager while Scheduling

    [ https://issues.apache.org/jira/browse/FLINK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102192#comment-17102192 ] 

Xintong Song commented on FLINK-17560:
--------------------------------------

Hi [~josson],
Could you provide the complete logs for this issue?

It is kind of expected that the slot report contains the old job id, due to the asynchronism between JM/RM/TM. It could happen when the report RM received was sent by TM before TM received message for JM to release the slots. However, usually it should not cause the scheduling failure because RM should notice that the slots become available when receiving the next slot report.

It would be helpful to look into the entire jobmanager/taskmanager logs to understand what goes wrong. 

> No Slots available exception in Apache Flink Job Manager while Scheduling
> -------------------------------------------------------------------------
>
>                 Key: FLINK-17560
>                 URL: https://issues.apache.org/jira/browse/FLINK-17560
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.8.3
>         Environment: Flink verson 1.8.3
> Session cluster
>            Reporter: josson paul kalapparambath
>            Priority: Major
>
> Set up
> ------
> Flink verson 1.8.3
> Zookeeper HA cluster
> 1 ResourceManager/Dispatcher (Same Node)
> 1 TaskManager
> 4 pipelines running with various parallelism's
> Issue
> ------
> Occationally when the Job Manager gets restarted we noticed that all the pipelines are not getting scheduled. The error that is reporeted by the Job Manger is 'not enough slots are available'. This should not be the case because task manager was deployed with sufficient slots for the number of pipelines/parallelism we have.
> We further noticed that the slot report sent by the taskmanger contains solts filled with old CANCELLED job Ids. I am not sure why the task manager still holds the details of the old jobs. Thread dump on the task manager confirms that old pipelines are not running.
> I am aware of https://issues.apache.org/jira/browse/FLINK-12865. But this is not the issue happening in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)