You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Elek, Marton (JIRA)" <ji...@apache.org> on 2019/01/22 13:22:00 UTC

[jira] [Created] (HADOOP-16064) Load configuration values from external sources

Elek, Marton created HADOOP-16064:
-------------------------------------

             Summary: Load configuration values from external sources
                 Key: HADOOP-16064
                 URL: https://issues.apache.org/jira/browse/HADOOP-16064
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Elek, Marton


This is a proposal to improve the Configuration.java to load configuration from external sources (kubernetes config map, external http reqeust, any cluster manager like ambari, etc.)

I will attach a patch to illustrate the proposed solution, but please comment the concept first, the patch is just poc and not fully implemented.

*Goals:*
 * **Load the configuration files (core-site.xml/hdfs-site.xml/...) from external locations instead of the classpath (classpath remains the default)
 * Make the configuration loading extensible
 * Make it in an backward-compatible way with minimal change in the existing Configuration.java

*Use-cases:*

 1.) load configuration from the namenode ([http://namenode:9878/conf]). With this approach only the namenode should be configured, other components require only the url of the namenode

 2.) Read configuration directly from kubernetes config-map (or mesos)

 3.) Read configuration from any external cluster management (such as Apache Ambari or any equivalent)

 4.) as of now in the hadoop docker images we transform environment variables (such as HDFS-SITE.XML_fs.defaultFs) to configuration xml files with the help of a python script. With the proposed implementation it would be possible to read the configuration directly from the system environment variables.

*Problem:*

The existing Configuration.java can read configuration from multiple sources. But most of the time it's used to load predefined config names ("core-site.xml" and "hdfs-site.xml") without configuration location. In this case the files will be loaded from the classpath.

I propose to add additional option to define the default location of core-site.xml and hdfs-site.xml (any configuration which is defined by string name) to use external sources in the classpath.

The configuration loading requires implementation + configuration (where are the external configs). We can't use regular configuration to configure the config loader (chicken/egg).

I propose to use a new environment variable HADOOP_CONF_SOURCE

The environment variable could contain a URL, where the schema of the url can define the config source and all the other parts can configure the access to the resource.

Examples:

HADOOP_CONF_SOURCE=hadoop-[http://namenode:9878/conf]

HADOOP_CONF_SOURCE=env://prefix

HADOOP_CONF_SOURCE=k8s://config-map-name

The ConfigurationSource interface can be as easy as:
{code:java}
/**
 * Interface to load hadoop configuration from custom location.
 */
public interface ConfigurationSource {

  /**
   * Method will be called one with the defined configuration url.
   *
   * @param uri
   */
  void initialize(URI uri) throws IOException;

  /**
   * Method will be called to load a specific configuration resource.
   *
   * @param name of the configuration resource (eg. hdfs-site.xml)
   * @return List of loaded configuraiton key and values.
   */
  List<ParsedItem> readConfiguration(String name);

}{code}
We can choose the right implementation based the schema of the uri and with Java Service Provider Interface mechanism (META-INF/services/org.apache.hadoop.conf.ConfigurationSource)

It could be with minimal modification in the Configuration.java (see the attached patch as an example)

 The patch contains two example implementation:

*hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/Env.java*

This can load configuration from environment variables based on a naming convention (eg. HDFS-SITE.XML_hdfs.dfs.key=value)

*hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/HadoopWeb.java*

 This implementation can load the configuration from a /conf servlet of any Hadoop components.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org