You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Stephan Ewen (JIRA)" <ji...@apache.org> on 2017/05/22 18:39:04 UTC
[jira] [Created] (FLINK-6666) RestartStrategy should differentiate
between types of recovery (global / local / resource missing)
Stephan Ewen created FLINK-6666:
-----------------------------------
Summary: RestartStrategy should differentiate between types of recovery (global / local / resource missing)
Key: FLINK-6666
URL: https://issues.apache.org/jira/browse/FLINK-6666
Project: Flink
Issue Type: Sub-task
Components: Distributed Coordination
Affects Versions: 1.3.0
Reporter: Stephan Ewen
Currently, the {{RestrartStrategy}} has a single method that is called when a failure requires an ExecutionGraph restart.
With the new addition of incremental recovery, it is desirable to distinguish between the type of failover that happens.
I would suggest to extend the {{RestartStrategy}} to support three cases/methods:
- {{restartGlobal()}} for a full restart recovery
- {{restartLocal()}} for a recovery coordinated by the {{FailoverStrategy}}
- {{restartOnMissingResources()}} if the failure cause was missing slots
The last case is interesting, in my opinion, because it is commonly desirable that regular failover has no delay, but failover on missing resources has a short delay (1s or so) to avoid very fast cycles of restart attempts (in standalone mode, there can easily be 100,000 restarts after a second, when no resources are available and no delay happens during restarts).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)