You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Bill Liu (JIRA)" <ji...@apache.org> on 2017/02/01 23:53:51 UTC
[jira] [Comment Edited] (FLINK-5668) Reduce dependency on HDFS at
job startup time
[ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849138#comment-15849138 ]
Bill Liu edited comment on FLINK-5668 at 2/1/17 11:52 PM:
----------------------------------------------------------
thanks [~wheat9] for filling the full contexts.
YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS.
Especially some of the HDFS dependency is not necessary at all.
For the taskmanager configuration file,
I take a deep look at the code,
The taskmaster-config is cloned from baseConfig and then made a very slitty change on it.
```
final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration(
config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT);
public static Configuration generateTaskManagerConfiguration(
Configuration baseConfig,
String jobManagerHostname,
int jobManagerPort,
int numSlots,
FiniteDuration registrationTimeout) {
Configuration cfg = baseConfig.clone();
cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname);
cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort);
cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString());
if (numSlots != -1){
cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots);
}
return cfg;
}
```
[~StephanEwen],
If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config.
was (Author: bill.liu8904):
thanks [~wheat9] for filling the full contexts.
YARN's own fault tolerance and high availability relies on HDFS , but It doesn't mean Flink-on-Yarn has to depend on HDFS.
Especially some of the HDFS dependency is not necessary at all.
For the taskmanager configuration file,
I take a deep look at the code,
The taskmaster-config is cloned from baseConfig and then made a very slitty change on it.
```
final Configuration taskManagerConfig = BootstrapTools.generateTaskManagerConfiguration(
config, akkaHostname, akkaPort, slotsPerTaskManager, TASKMANAGER_REGISTRATION_TIMEOUT);
public static Configuration generateTaskManagerConfiguration(
Configuration baseConfig,
String jobManagerHostname,
int jobManagerPort,
int numSlots,
FiniteDuration registrationTimeout) {
Configuration cfg = baseConfig.clone();
cfg.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, jobManagerHostname);
cfg.setInteger(ConfigConstants.JOB_MANAGER_IPC_PORT_KEY, jobManagerPort);
cfg.setString(ConfigConstants.TASK_MANAGER_MAX_REGISTRATION_DURATION, registrationTimeout.toString());
if (numSlots != -1){
cfg.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numSlots);
}
return cfg;
}
```
If JobManager web server is not a good place to share files, jobmanager don't need create a local taskmanager-config.yaml at all, it could just pass the the base config file and some dynamic properties to override the value in base config.
> Reduce dependency on HDFS at job startup time
> ---------------------------------------------
>
> Key: FLINK-5668
> URL: https://issues.apache.org/jira/browse/FLINK-5668
> Project: Flink
> Issue Type: Improvement
> Components: YARN
> Reporter: Bill Liu
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When create a Flink cluster on Yarn, JobManager depends on HDFS to share taskmanager-conf.yaml with TaskManager.
> It's better to share the taskmanager-conf.yaml on JobManager Web server instead of HDFS, which could reduce the HDFS dependency at job startup.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)