You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2017/10/25 13:11:31 UTC
[GitHub] flink pull request #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOS...
GitHub user tillrohrmann opened a pull request:
https://github.com/apache/flink/pull/4903
[FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR
## What is the purpose of the change
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.
## Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
- The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
## Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tillrohrmann/flink hardenJobManagerFailsITCase
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/4903.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4903
----
commit f138e0ac9b41d39b3d15e9892b6757cbf63415c3
Author: Till Rohrmann <tr...@apache.org>
Date: 2017-10-25T10:39:49Z
[FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR
The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
ActorSystem is gated in case of a connection loss. The default value is set
to 50 ms.
commit f14f100cef678f903d44efc1a77aa9991df3dfca
Author: Till Rohrmann <tr...@apache.org>
Date: 2017-10-25T11:24:41Z
[hotfix] Speed up JobManagerFailsITCase by decreasing timeout
----
---
[GitHub] flink issue #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/4903
Change looks good.
Is `50 ms` also akka's default value?
Out of curiosity, what triggered the need to introduce this option.
---
[GitHub] flink issue #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR
Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/4903
Sounds fair, +1
---
[GitHub] flink pull request #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOS...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/flink/pull/4903
---
[GitHub] flink issue #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR
Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:
https://github.com/apache/flink/pull/4903
Akka's default value is actually 5 seconds, which I think is a bit too high.
I actually tried to backtrack an instability in the `JobManagerFailsITCase` and noticed that this test took roughly 16 s to execute (the ITCase contains only 2 tests where we restart the JM). Part of the reason was that Akka gated the JobManager ActorSystem for 5 seconds after we let the JM fail.
The actual solution to speed up this test was then to don't reuse the same port for the new JobManager system, but I couldn't think of a good reason to keep the 5 seconds default. Moreover, some other tests which also run into the case of gated connections could also benefit from that. I think lowering the gated interval should allow us to reestablish a lost connection faster.
---