You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by wangzhijiang999 <gi...@git.apache.org> on 2017/01/13 08:49:20 UTC

[GitHub] flink pull request #3113: [FLINK-4912] Introduce RECONCILIATING state in Exe...

GitHub user wangzhijiang999 opened a pull request:

    https://github.com/apache/flink/pull/3113

    [FLINK-4912] Introduce RECONCILIATING state in ExecutionGraph and Exe\u2026

    This is part of the non-disruptive JobManager failure recovery.
    
    Add a JobStatus and ExecutionState {{RECONCILING}}.
    If a job is started on a JobManager for master recovery, the job status with all the executions transition to {{RECONCILING}} state.
    
    From {{RECONCILING}}, execution can go to any existing task states (execution reconciled with TaskManager).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangzhijiang999/flink FLINK-4912

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3113
    
----
commit 0fbd628b9b8817fd1b71faca92d87c56213d79f6
Author: \u6dd8\u6c5f <ta...@alibaba-inc.com>
Date:   2017-01-13T08:41:37Z

    [FLINK-4912] Introduce RECONCILIATING state in ExecutionGraph and Execution for JobManager failure recovery

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3113: [FLINK-4912] Introduce RECONCILIATING state in Exe...

Posted by wangzhijiang999 <gi...@git.apache.org>.
Github user wangzhijiang999 commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3113#discussion_r95975088
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/execution/ExecutionState.java ---
    @@ -25,16 +25,23 @@
      * <pre>{@code
      *
      *     CREATED  -> SCHEDULED -> DEPLOYING -> RUNNING -> FINISHED
    - *                     |            |          |
    - *                     |            |   +------+
    - *                     |            V   V
    - *                     |         CANCELLING -----+----> CANCELED
    - *                     |                         |
    - *                     +-------------------------+
    + *            |         |            |          |
    + *            |         |            |   +------+
    + *            |         |            V   V
    + *            |         |         CANCELLING -----+----> CANCELED
    + *            |         |                         |
    + *            |        +-------------------------+
    + *            |
    + *            |                                   ... -> FAILED
    + *           V
    + *    RECONCILING  -> RUNNING | FINISHED | CANCELED | FAILED
      *
    - *                                               ... -> FAILED
      * }</pre>
      *
    + * <p>It is possible to enter the {@code RECONCILING} state from {@code CREATED}
    --- End diff --
    
    Thank you for suggestions of the format. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    I would like to take a look at this soon...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    Thanks for the PR. This looks good to me. I'm not too familiar with the FLIP-6 plans though. I would wait for someone who is in on that to merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    Will merge this.
    To make it proper robust, I will add some tests that validate the state transitions of the state machine...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3113: [FLINK-4912] Introduce RECONCILIATING state in Exe...

Posted by uce <gi...@git.apache.org>.
Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3113#discussion_r95960106
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/execution/ExecutionState.java ---
    @@ -25,16 +25,23 @@
      * <pre>{@code
      *
      *     CREATED  -> SCHEDULED -> DEPLOYING -> RUNNING -> FINISHED
    - *                     |            |          |
    - *                     |            |   +------+
    - *                     |            V   V
    - *                     |         CANCELLING -----+----> CANCELED
    - *                     |                         |
    - *                     +-------------------------+
    + *            |         |            |          |
    + *            |         |            |   +------+
    + *            |         |            V   V
    + *            |         |         CANCELLING -----+----> CANCELED
    + *            |         |                         |
    + *            |        +-------------------------+
    + *            |
    + *            |                                   ... -> FAILED
    + *           V
    + *    RECONCILING  -> RUNNING | FINISHED | CANCELED | FAILED
      *
    - *                                               ... -> FAILED
      * }</pre>
      *
    + * <p>It is possible to enter the {@code RECONCILING} state from {@code CREATED}
    --- End diff --
    
    I think it would be best to move this paragraph below the other paragraphs below (below line 48).
    
    A general note: I think for Javadocs it's enough to just have the opening `<p>` tag like this:
    
    ```
    <p>Start here...
    ...continue and not closing tag at the end.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3113: [FLINK-4912] Introduce RECONCILIATING state in Exe...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/3113


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    Given that this only extends the enum and does not add changes to the state transitions, we can merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    Considering the possible state transitions:
    
    ## ExecutionState
      - `RECONCILING` can only be entered from `CREATED`
    
    Simple:
      - `RECONCILING` can go to `RUNNING` if the task was reconciled
      - `RECONCILING` can go to `FAILED` if the task was not reconciled
    
    Complex:
      - For `RECONCILING` to go to `FINISHED`, `CANCELED`, it would mean that the TaskManager that has the task would report (when registering at the JobManager) a task that is no longer executing. To do that, the TaskManager would need to "remember" tasks that completed and where it did not get an acknowledgement from the JobManager for the execution state update. Is that anticipated?
    
    ## JobStatus
      - `RECONCILING` can only be entered from `CREATED`
    
    Simple:
      - `RECONCILING` can go to `RUNNING` - if all TaskManagers report their status and tasks as running
      - `RECONCILING` can go to `FAILING` - if not all tasks were reported.
    
    Complex:
      - For reconciling to go to into `FINISHED`, we'd need that the `ExecutionState` can go to `FINISHED`.
    
    What do you think about only doing the "simple" option in the first version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by wangzhijiang999 <gi...@git.apache.org>.
Github user wangzhijiang999 commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    @StephanEwen , thank you for the concrete suggestions. Sorry for delay response because of Chinese Spring Festival Holiday.
    
    I have considered and added some tests to validate the state transitions of the state machine related with the later processes which would be submitted in the following PRs together.
    
    I totally agree with the consideration of the above possible state transitions. And I plan to give a detail explanation of my implementation in another jira soon. It is actually a bit complex to do that ,so I try to break them down into small ones in order to review and merge quickly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3113: [FLINK-4912] Introduce RECONCILIATING state in ExecutionG...

Posted by wangzhijiang999 <gi...@git.apache.org>.
Github user wangzhijiang999 commented on the issue:

    https://github.com/apache/flink/pull/3113
  
    @StephanEwen , I already created #5703 for further detail recovery process and it may cover your considerations. Wish your further response, thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---