You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/09/05 10:21:00 UTC

[jira] [Commented] (FLINK-7580) Let LeaderGatewayRetriever implementations automatically retry failed gateway retrieval operations

    [ https://issues.apache.org/jira/browse/FLINK-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153429#comment-16153429 ] 

ASF GitHub Bot commented on FLINK-7580:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/4643

    [FLINK-7580] Automatically retry failed gateway retrievals

    ## What is the purpose of the change
    
    The LeaderGatewayRetriever implementations, AkkaJobManagerRetriever and the RpcGatewayRetriever, now automatically retry the gateway retrieval operation for a fixed number of times with a retry delay before completing the gateway future with an exception.
    
    ## Verifying this change
    
    This change is already covered by existing tests, such as `FutureUtilsTest` (`retryWithDelay` tests).
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink retryingLeaderGatewayRetrieverImpl

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/4643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4643
    
----
commit 15358b4acaef2b2a84e23cf21dcade2014df4abb
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-08-15T10:00:58Z

    [FLINK-7458] Generalize GatewayRetriever for WebRuntimeMonitor
    
    Introduce a generalized GatewayRetriever replacing the JobManagerRetriever. The
    GatewayRetriever fulfills the same purpose as the JobManagerRetriever with the
    ability to retrieve the gateway for an arbitrary endpoint type.

commit 1386ea4f56f86970cb0dc8af783144c35fe1e3f3
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-08-15T11:55:47Z

    [FLINK-7459] Generalize Flink's redirection logic
    
    Introduce RedirectHandler which can be extended to add redirection functionality to all
    SimpleInboundChannelHandlers. This allows to share the same functionality across the
    StaticFileServerHandler and the RuntimeMonitorHandlerBase which could now be removed.
    In the future, the AbstractRestHandler will also extend the RedirectHandler.

commit 9eaf8f6227f571e898fe28b61d89173416bda129
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-08-18T12:29:29Z

    [FLINK-7533] Let LeaderGatewayRetriever retry failed gateway retrievals
    
    Add test case
    
    Only log LeaderGatewayRetriever exception on Debug log level
    
    Properly fail outdated gateway retrieval operations

commit 42cc51b5db800c6776c2e398ea2cae0651b2d49c
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-09-04T14:42:24Z

    [FLINK-7576] [futures] Add FutureUtils.retryWithDelay
    
    FutureUtils.retryWithDelay executes the given operation of type
    Callable<CompletableFuture<T>> n times and waits in between retries the given
    delay. This allows to retry an operation with a specified delay.
    
    Make retry and retry with delay future properly cancellable

commit a717cc616af2b3a24fbeb9b70137e0401ea24507
Author: Till Rohrmann <tr...@apache.org>
Date:   2017-09-04T15:57:08Z

    [FLINK-7580] Automatically retry failed gateway retrievals
    
    The LeaderGatewayRetriever implementations, AkkaJobManagerRetriever and the
    RpcGatewayRetriever, now automatically retry the gateway retrieval operation
    for a fixed number of times with a retry delay before completing the gateway
    future with an exception.
    
    Retry AkkaJobManagerRetriever
    
    Retry RpcGatewayRetriever

----


> Let LeaderGatewayRetriever implementations automatically retry failed gateway retrieval operations
> --------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-7580
>                 URL: https://issues.apache.org/jira/browse/FLINK-7580
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Coordination, Webfrontend
>    Affects Versions: 1.4.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Minor
>
> The {{LeaderGatewayRetrieval}} implementations {{AkkaJobManagerRetriever}} and the {{RpcGatewayRetriever}} should automatically retry failed gateway retrieval operations. This could be the case if the {{WebRuntimeMonitor}} is started before the actual Akka/RPC component. I would propose to retry it a fixed number of times with a short delay in between. If the resolution fails after exceeding the retries, a new retrieval operation will be started when requesting information from the {{WebRuntimeMonitor}} with FLINK-7533. This ensures that the retry operation won't run forever but also that it will eventually connect to the Akka/RPC component if it is existent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)