You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Dan Hill (Jira)" <ji...@apache.org> on 2020/10/20 02:59:00 UTC

[jira] [Created] (FLINK-19721) Speed up the frequency of checks in RpcGatewayRetriever

Dan Hill created FLINK-19721:
--------------------------------

             Summary: Speed up the frequency of checks in RpcGatewayRetriever
                 Key: FLINK-19721
                 URL: https://issues.apache.org/jira/browse/FLINK-19721
             Project: Flink
          Issue Type: Improvement
          Components: Test Infrastructure
    Affects Versions: 1.11.2, 1.11.1, 1.12.0
            Reporter: Dan Hill


When writing Flink tests, I could reduce the latency of my 'waitForDone' calls by writing my own looping retry-sleep logic than rely on `TableResult.getJobClient().get().getJobExecutionResult(...)`.  This is because `[MiniCluster|https://github.com/apache/flink/blob/47ca19a74e11c72842124852875262959477c459/flink-runtime/src/main/java/org/apache/flink/runtime/minicluster/MiniCluster.java#L338]` uses [RpcGatewayRetriever|https://github.com/apache/flink/blob/8674b69964eae50cad024f2c5caf92a71bf21a09/flink-runtime/src/main/java/org/apache/flink/runtime/webmonitor/retriever/impl/RpcGatewayRetriever.java] which has a fixed 20ms retry.

 

For a complex test, this can save 50ms-100ms per test run.

 

An easy fix is to change this to an retry with exponential backoff.  This reduces the impact 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)