You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/10/20 17:56:00 UTC

[jira] [Updated] (FLINK-19721) Speed up the frequency of checks in RpcGatewayRetriever

     [ https://issues.apache.org/jira/browse/FLINK-19721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated FLINK-19721:
-----------------------------------
    Labels: pull-request-available  (was: )

> Speed up the frequency of checks in RpcGatewayRetriever
> -------------------------------------------------------
>
>                 Key: FLINK-19721
>                 URL: https://issues.apache.org/jira/browse/FLINK-19721
>             Project: Flink
>          Issue Type: Improvement
>          Components: Test Infrastructure
>    Affects Versions: 1.12.0, 1.11.1, 1.11.2
>            Reporter: Dan Hill
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When writing Flink tests, I could reduce the latency of my 'waitForDone' calls by writing my own looping retry-sleep logic than rely on `TableResult.getJobClient().get().getJobExecutionResult(...)`.  This is because `[MiniCluster|https://github.com/apache/flink/blob/47ca19a74e11c72842124852875262959477c459/flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java#L338]` uses [RpcGatewayRetriever|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/impl/RpcGatewayRetriever.java] which has a fixed 20ms retry.
>  
> For a complex test, this can save 50ms-100ms per test run.
>  
> An easy fix is to change this to an retry with exponential backoff.  This reduces the impact 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)