You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Elek, Marton (JIRA)" <ji...@apache.org> on 2019/01/22 13:23:00 UTC

[jira] [Updated] (HADOOP-16064) Load configuration values from external sources

     [ https://issues.apache.org/jira/browse/HADOOP-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elek, Marton updated HADOOP-16064:
----------------------------------
    Attachment: HADOOP-16064.001.patch

> Load configuration values from external sources
> -----------------------------------------------
>
>                 Key: HADOOP-16064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16064
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>         Attachments: HADOOP-16064.001.patch
>
>
> This is a proposal to improve the Configuration.java to load configuration from external sources (kubernetes config map, external http reqeust, any cluster manager like ambari, etc.)
> I will attach a patch to illustrate the proposed solution, but please comment the concept first, the patch is just poc and not fully implemented.
> *Goals:*
>  * Load the configuration files (core-site.xml/hdfs-site.xml/...) from external locations instead of the classpath (classpath remains the default)
>  * Make the configuration loading extensible
>  * Make it in an backward-compatible way with minimal change in the existing Configuration.java
> *Use-cases:*
>  1.) load configuration from the namenode ([http://namenode:9878/conf]). With this approach only the namenode should be configured, other components require only the url of the namenode
>  2.) Read configuration directly from kubernetes config-map (or mesos)
>  3.) Read configuration from any external cluster management (such as Apache Ambari or any equivalent)
>  4.) as of now in the hadoop docker images we transform environment variables (such as HDFS-SITE.XML_fs.defaultFs) to configuration xml files with the help of a python script. With the proposed implementation it would be possible to read the configuration directly from the system environment variables.
> *Problem:*
> The existing Configuration.java can read configuration from multiple sources. But most of the time it's used to load predefined config names ("core-site.xml" and "hdfs-site.xml") without configuration location. In this case the files will be loaded from the classpath.
> I propose to add additional option to define the default location of core-site.xml and hdfs-site.xml (any configuration which is defined by string name) to use external sources in the classpath.
> The configuration loading requires implementation + configuration (where are the external configs). We can't use regular configuration to configure the config loader (chicken/egg).
> I propose to use a new environment variable HADOOP_CONF_SOURCE
> The environment variable could contain a URL, where the schema of the url can define the config source and all the other parts can configure the access to the resource.
> Examples:
> HADOOP_CONF_SOURCE=hadoop-[http://namenode:9878/conf]
> HADOOP_CONF_SOURCE=env://prefix
> HADOOP_CONF_SOURCE=k8s://config-map-name
> The ConfigurationSource interface can be as easy as:
> {code:java}
> /**
>  * Interface to load hadoop configuration from custom location.
>  */
> public interface ConfigurationSource {
>   /**
>    * Method will be called one with the defined configuration url.
>    *
>    * @param uri
>    */
>   void initialize(URI uri) throws IOException;
>   /**
>    * Method will be called to load a specific configuration resource.
>    *
>    * @param name of the configuration resource (eg. hdfs-site.xml)
>    * @return List of loaded configuraiton key and values.
>    */
>   List<ParsedItem> readConfiguration(String name);
> }{code}
> We can choose the right implementation based the schema of the uri and with Java Service Provider Interface mechanism (META-INF/services/org.apache.hadoop.conf.ConfigurationSource)
> It could be with minimal modification in the Configuration.java (see the attached patch as an example)
>  The patch contains two example implementation:
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/Env.java*
> This can load configuration from environment variables based on a naming convention (eg. HDFS-SITE.XML_hdfs.dfs.key=value)
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/HadoopWeb.java*
>  This implementation can load the configuration from a /conf servlet of any Hadoop components.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org