You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Zhanghao Chen (Jira)" <ji...@apache.org> on 2022/11/21 03:11:00 UTC
[jira] [Created] (FLINK-30101) YARN client should
Zhanghao Chen created FLINK-30101:
-------------------------------------
Summary: YARN client should
Key: FLINK-30101
URL: https://issues.apache.org/jira/browse/FLINK-30101
Project: Flink
Issue Type: Improvement
Components: Client / Job Submission
Affects Versions: 1.16.0
Reporter: Zhanghao Chen
Fix For: 1.17.0
*Problem*
Currently, the procedure of retrieving a Flink on YARN cluster client is as follows (in YarnClusterDescriptor#retrieve method):
# Get application report from YARN
# Set rest.address & rest.port using the info from application report
# Create a new RestClusterClient using the updated configuration, will use client HA serivce to fetch the rest.address & rest.port if HA is enabled
Here, we can see that the usage of client HA in step 3 is redundant, as we've already got the rest.address & rest.port from YARN application report. When ZK HA is enabled, this would take ~1.5 s to initialize client HA services and fetch the rest IP & port.
1.5 s can mean a lot for latency-sensitive client operations. In my company, we use Flink client to submit short-running session jobs and e2e latency is critical. The job submission time is around 10 s on average, and 1.5s would mean 15% of time saving.
*Proposal*
When retrieving a Flink on YARN cluster client, use StandaloneClientHAServices to
create RestClusterClient instead as we have pre-fetched rest.address & rest.port from YARN application report. This is also what we did in KubernetesClusterDescriptor.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)