You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by tillrohrmann <gi...@git.apache.org> on 2017/10/25 13:11:31 UTC

[GitHub] flink pull request #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOS...

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/4903

    [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR

    ## What is the purpose of the change
    
    The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
    ActorSystem is gated in case of a connection loss. The default value is set
    to 50 ms.
    
    ## Verifying this change
    
    
    This change is a trivial rework / code cleanup without any test coverage.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink hardenJobManagerFailsITCase

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4903.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4903
    
----
commit f138e0ac9b41d39b3d15e9892b6757cbf63415c3
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-10-25T10:39:49Z

    [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR
    
    The AkkaOptions.RETRY_GATE_CLOSED_FOR allows to configure how long a remote
    ActorSystem is gated in case of a connection loss. The default value is set
    to 50 ms.

commit f14f100cef678f903d44efc1a77aa9991df3dfca
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-10-25T11:24:41Z

    [hotfix] Speed up JobManagerFailsITCase by decreasing timeout

----


---

[GitHub] flink issue #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4903
  
    Change looks good.
    Is `50 ms` also akka's default value?
    Out of curiosity, what triggered the need to introduce this option.


---

[GitHub] flink issue #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR

Posted by StephanEwen <gi...@git.apache.org>.
Github user StephanEwen commented on the issue:

    https://github.com/apache/flink/pull/4903
  
    Sounds fair, +1
    



---

[GitHub] flink pull request #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOS...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/flink/pull/4903


---

[GitHub] flink issue #4903: [FLINK-7914] Introduce AkkaOptions.RETRY_GATE_CLOSED_FOR

Posted by tillrohrmann <gi...@git.apache.org>.
Github user tillrohrmann commented on the issue:

    https://github.com/apache/flink/pull/4903
  
    Akka's default value is actually 5 seconds, which I think is a bit too high.
    
    I actually tried to backtrack an instability in the `JobManagerFailsITCase` and noticed that this test took roughly 16 s to execute (the ITCase contains only 2 tests where we restart the JM). Part of the reason was that Akka gated the JobManager ActorSystem for 5 seconds after we let the JM fail.
    
    The actual solution to speed up this test was then to don't reuse the same port for the new JobManager system, but I couldn't think of a good reason to keep the 5 seconds default. Moreover, some other tests which also run into the case of gated connections could also benefit from that. I think lowering the gated interval should allow us to reestablish a lost connection faster.


---