You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Zhijiang Wang (JIRA)" <ji...@apache.org> on 2017/01/16 10:31:26 UTC

[jira] [Updated] (FLINK-5501) Determine whether the job starts from last JobManager failure

     [ https://issues.apache.org/jira/browse/FLINK-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhijiang Wang updated FLINK-5501:
---------------------------------
    Description: 
When the {{JobManagerRunner}} grants leadership, it should check whether the current job is already running or not. If the job is running, the {{JobManager}} should reconcile itself (enter RECONCILING state) and waits for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can schedule the {{ExecutionGraph}} in common way.

The {{RunningJobsRegistry}} can provide the way to check the job running status, but we should expand the current interface and fix the related process to support this function.

1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} granting leadership at the first time.

2. If the job finishes, the job status will be set FINISHED by {{RunningJobsRegistry}} and the status will be deleted before exit. 

3. If the mini cluster starts multi {{JobManagerRunner}}s, and the leader {{JobManagerRunner}} already finishes the job to set the job status FINISHED, other {{JobManagerRunner}}s will exit after grants the leadership again.

4. If the {{JobManager}} fails, the job status will be still in RUNNING. So if the {{JobManagerRunner}} (the previous or new one) grants leadership again, it will check the job status and enters {{RECONCILING}} state.

  was:
When the {{JobManagerRunner}} grants leadership, it should check whether the current job is already running or not. If the job is running, the {{JobManager}} should reconcile itself (enter RECONCILING state) and waits for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can schedule the {{ExecutionGraph}} in common way.

The {{RunningJobsRegistry}} can provide the way to check the job running status, but we should expand the current interface and fix the related process to support this function.

1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} granting leadership at the first time.
2. If the job finishes, the job status will be set FINISHED by {{RunningJobsRegistry}} and the status will be deleted before exit. 
3. If the {{JobManager}} fails, the job status will be still in RUNNING, so when the {{JobManagerRunner}} (the previous or new one) grants leadership again, it checks the job status and enters {{RECONCILING}} state.


> Determine whether the job starts from last JobManager failure
> -------------------------------------------------------------
>
>                 Key: FLINK-5501
>                 URL: https://issues.apache.org/jira/browse/FLINK-5501
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager
>            Reporter: Zhijiang Wang
>            Assignee: Zhijiang Wang
>
> When the {{JobManagerRunner}} grants leadership, it should check whether the current job is already running or not. If the job is running, the {{JobManager}} should reconcile itself (enter RECONCILING state) and waits for the {{TaskManager}} reporting task status. Otherwise the {{JobManger}} can schedule the {{ExecutionGraph}} in common way.
> The {{RunningJobsRegistry}} can provide the way to check the job running status, but we should expand the current interface and fix the related process to support this function.
> 1. {{RunningJobsRegistry}} sets RUNNING status after {{JobManagerRunner}} granting leadership at the first time.
> 2. If the job finishes, the job status will be set FINISHED by {{RunningJobsRegistry}} and the status will be deleted before exit. 
> 3. If the mini cluster starts multi {{JobManagerRunner}}s, and the leader {{JobManagerRunner}} already finishes the job to set the job status FINISHED, other {{JobManagerRunner}}s will exit after grants the leadership again.
> 4. If the {{JobManager}} fails, the job status will be still in RUNNING. So if the {{JobManagerRunner}} (the previous or new one) grants leadership again, it will check the job status and enters {{RECONCILING}} state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)