You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Robert Metzger (Jira)" <ji...@apache.org> on 2020/07/28 15:09:00 UTC

[jira] [Commented] (FLINK-16866) Make job submission non-blocking

    [ https://issues.apache.org/jira/browse/FLINK-16866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166490#comment-17166490 ] 

Robert Metzger commented on FLINK-16866:
----------------------------------------

After an offline discussion with [~trohrmann], we are proposing to address this issue as follows:

Changes to the Dispatcher
- As part of the Dispatcher, we'll introduce a Job abstraction that tracks the final step of the job submission: the creation of the job manager.
- the job submission REST handler will return immediately after triggering the creation of the job manager.
- in this phase, the job will be in an INITIALIZING state. Once the job manager is started, the job is INITIALIZED.
- errors during the creation of the jobmanager are stored in the new job abstraction, probably in an ArchivedExecutionGraph. Failed submissions need to get evicted eventually.
 - calls to get the job status or cancel the job will have to be adopted to this change (so that they return the INITIALIZING state, properly cancel the job manager creation or fail the request (for example when triggering a savepoint on an initializing job). As a follow up idea, we could improve the cancellation of the initialization by executing it in a thread controlled by the "Job" abstraction, so that we can interrupt the thread (cooperation is not guaranteed))

Changes to other components:
- The web UI will need to handle jobs in the INITIALIZING state differently: Initially the job shall be listed in the "Running" section, but it won't be clickable OR it will show an almost-empty page, explaining that it is still pending submission. Submission errors should be accessible in the UI.
- The CliFrontend will keep its current semantics: After the submission has succeeded, it will periodically query the REST endpoint until the initialization is finished (or failed).
- The ExecutionEnvironment.executeAsync() call will only return a JobClient, once the job manager has been initialized.

> Make job submission non-blocking
> --------------------------------
>
>                 Key: FLINK-16866
>                 URL: https://issues.apache.org/jira/browse/FLINK-16866
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.9.2, 1.10.0, 1.11.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> Currently, Flink waits to acknowledge a job submission until the corresponding {{JobManager}} has been created. Since its creation also involves the creation of the {{ExecutionGraph}} and potential FS operations, it can take a bit of time. If the user has configured a too low {{web.timeout}}, the submission can time out only reporting a {{TimeoutException}} to the user.
> I propose to change the notion of job submission slightly. Instead of waiting until the {{JobManager}} has been created, a job submission is complete once all job relevant files have been uploaded to the {{Dispatcher}} and the {{Dispatcher}} has been told about it. Creating the {{JobManager}} will then belong to the actual job execution. Consequently, if problems occur while creating the {{JobManager}} it will result into a job failure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)