You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2016/06/17 12:56:03 UTC

[GitHub] flink pull request #1954: [FLINK-3190] failure rate restart strategy

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1954#discussion_r67505399
  
    --- Diff: docs/apis/streaming/fault_tolerance.md ---
    @@ -338,6 +342,77 @@ The default value is the value of *akka.ask.timeout*.
     
     {% top %}
     
    +### Failure Rate Restart Strategy
    +
    +The failure rate restart strategy restarts job after failure, but when `failure rate` (failures per time unit) is exceeded, the job eventually fails.
    +In-between two consecutive restart attempts, the restart strategy waits a fixed amount of time.
    +
    +This strategy is enabled as default by setting the following configuration parameter in `flink-conf.yaml`.
    +
    +~~~
    +restart-strategy: failure-rate
    +~~~
    +
    +<table class="table table-bordered">
    +  <thead>
    +    <tr>
    +      <th class="text-left" style="width: 40%">Configuration Parameter</th>
    +      <th class="text-left" style="width: 40%">Description</th>
    +      <th class="text-left">Default Value</th>
    +    </tr>
    +  </thead>
    +  <tbody>
    +    <tr>
    +        <td><it>restart-strategy.failure-rate.max-failures-per-unit</it></td>
    +        <td>Maximum number of restarts in given time unit before failing a job</td>
    +        <td>1</td>
    +    </tr>
    +    <tr>
    +        <td><it>restart-strategy.failure-rate.failure-rate-unit</it></td>
    +        <td>Time unit for measuring failure rate. One of java.util.concurrent.TimeUnit values</td>
    +        <td>MINUTES</td>
    +    </tr>
    +    <tr>
    +        <td><it>restart-strategy.failure-rate.delay</it></td>
    +        <td>Delay between two consecutive restart attempts</td>
    +        <td><it>akka.ask.timeout</it></td>
    +    </tr>
    +  </tbody>
    +</table>
    +
    +~~~
    +restart-strategy.failure-rate.max-failures-per-unit: 3
    +restart-strategy.failure-rate.failure-rate-unit: MINUTES
    --- End diff --
    
    Wouldn't it also make sense to let the user specify an interval for the maximum number of failures. I think this would be more flexible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---