You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by StephanEwen <gi...@git.apache.org> on 2017/02/24 18:46:03 UTC

[GitHub] flink issue #3340: [FLINK-5703][runtime] ExecutionGraph recovery via reconci...

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3340

There is a lot of good code in this PR.

What I would suggest to make different is to NOT make `Execution`, `IntermediateResult` partition, etc mutable. There is a big benefit to having them immutable, both for safety against accidental bugs, and also from a concurrency standpoint. Regular mutable non-volatile variables have visibility problems in concurrent access, which leads to very subtle bugs that are hard to trace and recover: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

For `Execution`, I would change it such that the `ExecutionVertex` swaps the current `Execution` (in state `RECONCILING`) with the a new `Execution` that has all the fields properly set. That way there is a quasi atomic replacement of the previous execution, which is safer and easier to reason about.

It may be worth to even create the JobManager (from the JobManagerRunner) and the ExecutionGraph directly in reconciliation mode. That way, you don't have to worry what to do with TaskManagers that register while the JobMaster has not yet switched to reconciliation.

Can you explain a bit why the reconciliation need the two phases:
1. Receive reports, set the execution attempt and state
2. Once all are there, transition the state to running

Could it be done such that on report the `Execution` is directly set to the reported state? Then once all reports are there, everything is just fine. After the reconciliation timeout expires, you would just iterate over all executions and `fail()` the ones that are still in `RECONCILE` - recovery logic takes care of the rest.
That may also handle the case where some already reconciled tasks fail while reconciliation still happens. The ExecutionGraph would already recover the local tasks that are affected by the failure.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---