You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Junrui Li (Jira)" <ji...@apache.org> on 2023/03/15 04:39:00 UTC

[jira] [Commented] (FLINK-31457) Support waiting for required resources in DefaultScheduler during job restart

    [ https://issues.apache.org/jira/browse/FLINK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700503#comment-17700503 ] 

Junrui Li commented on FLINK-31457:
-----------------------------------

[~a.pilipenko] I'm not sure what is the scenario where `NoResourceAvailableException` will be reported after job restart? Can you describe it in detail?

IIUC, if it is a session cluster, the slot may be occupied by other jobs after slot idle timeout. Maybe you can increase the slot.idle.timeout.

In addition, the adaptive scheduler has a mechanism to wait for resources because it can dynamically adjust the parallelism, and run jobs with a small parallelism when resources are insufficient, while the default scheduler does not have such a capability, so when resources are insufficient, it will report `NoResourceAvailableException`. If you want to run jobs even when resources are insufficient, you can use the adaptive scheduler in stream job.

> Support waiting for required resources in DefaultScheduler during job restart
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-31457
>                 URL: https://issues.apache.org/jira/browse/FLINK-31457
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.3
>            Reporter: Aleksandr Pilipenko
>            Priority: Major
>
> Currently Flink support [waiting for required resources to become available|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout] during job restart only while using adaptive scheduler.
> On the other hand, if cluster is using default scheduler and there is not enough slots available - restart attempts will fail with `NoResourceAvailableException` until resource requirements are satisfied.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)