You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/12/17 19:55:46 UTC

[jira] [Commented] (FLINK-3184) Decrease Akka timeouts on cluster side to make system more responsive

    [ https://issues.apache.org/jira/browse/FLINK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062548#comment-15062548 ] 

ASF GitHub Bot commented on FLINK-3184:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/1468

    [FLINK-3184] [timeouts] Decrease timeouts

    This PR introduces a client side timeout of 60 s and a cluster side timeout of 10 s. Both timeouts can be configured via `akka.client.timeout` and `akka.ask.timeout` in the configuration.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink decreaseAkkaTimeout

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1468.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1468
    
----
commit 754c0c408d92e931218a137f388fb77f51df964a
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-12-15T14:15:12Z

    Harmonize config key for number of retries and retry delay

commit dd81da02ca6eaf8e0e38cf4511e26cb553c71f72
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-12-15T16:34:17Z

    Add missing param descriptions to FlinkYarnCluster, remove implicit timeout from ApplicationClient

commit 5e967bf8a9ba066be73905338acfd5deb4894602
Author: Till Rohrmann <tr...@apache.org>
Date:   2015-12-15T16:37:20Z

    [FLINK-3184] [timeouts] Set default cluster side timeout to 10 s and the client side timeout to 60 s.
    
    Adapt Akka failure detector timings to respect new 10 s Akka ask timeout. Add logging statements to JobClientActor
    
    Introduce separation between client and cluster timeout
    
    Sets the cluster timeout to 10 s and the client timeout to 60 s.

----


> Decrease Akka timeouts on cluster side to make system more responsive
> ---------------------------------------------------------------------
>
>                 Key: FLINK-3184
>                 URL: https://issues.apache.org/jira/browse/FLINK-3184
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.0.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> Currently, the default timeout for futures is set to 100 s. This also the time used to wait in between restart attempts if no other value has been explicitly specified. Especially in the streaming case, it is often necessary to detect failures and to react to failures in shorter period than 100 s. Therefore, I propose to decrease the default timeout to 10 s.
> Additionally, I propose to introduce a slightly higher timeout for the client side (e.g. 60 s). The reason is that in case of a {{JobManager}} the client has to wait until the cluster has recovered. Using ZooKeeper for that can entail a longer timeout than 10 s. In such a case a recovery could be falsely recognized as a lost connection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)