You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2019/01/22 14:16:00 UTC

[jira] [Commented] (HADOOP-16064) Load configuration values from external sources

    [ https://issues.apache.org/jira/browse/HADOOP-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748760#comment-16748760 ] 

Steve Loughran commented on HADOOP-16064:
-----------------------------------------

Interesting idea.

This is still a fairly raw patch and it's going to have to go through a fair few iterations before it is ready to go in. Configuration is such as stable and foundational class that we need to tread very carefully when doing radical things. I'm not saying there aren't benefits, only that we mustn't rush into supporting something which turns out not to be ready.

There are some serious security implications here. I hope the (forthcoming) documentation has a section here. [~lmccay] will no doubt have opinions.

And failure handling, obviously. I don't see any here other than e.printStackTrace.

 

patch-wise
 
 * ozone/hdds pom changes are urgent and should go on their own patch.
 * mark everything as private/unstable
* import ordering should be [java, third-party, org-apache, static].
* use final fields where possible
* a lot of the class names are too generic "env", "stub". Please make them clearer
* and add the javadocs to the classes, methods and production packages

h3.  {{ConfigurationLocator}} 
* must use logs for printing
* go on, add some javadocs
* L71 toLowerCase() needs a locale

h3. {{ConfigurationLocator}}

lacks resilience to/strategy for failure for service implementations to load.

h3. {{Env}}

* I don't like this class name; too close to system stuff, please use something a bit longer/clearer
* what does the prefix field do?

h3. {{HadoopWeb}}

* again, name bears ~no relation to what it does, which is "load hadoop configuration from remote site"
* and again, needs to log errors through SLF4J
* With the URL which is causing problems. Without this your support team will never forgive you.

I don't see any resilience to the classic problem of "page returns HTML". check your content-type before trying to parse it


h3. TestHadoopWeb

* doesn't shut down server.
* doesn't stress the code with invalid URLs, error responses, non-XML data

Proposed: 
* do the server setup/teardown in before/after class. 
* 




> Load configuration values from external sources
> -----------------------------------------------
>
>                 Key: HADOOP-16064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16064
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>         Attachments: HADOOP-16064.001.patch
>
>
> This is a proposal to improve the Configuration.java to load configuration from external sources (kubernetes config map, external http reqeust, any cluster manager like ambari, etc.)
> I will attach a patch to illustrate the proposed solution, but please comment the concept first, the patch is just poc and not fully implemented.
> *Goals:*
>  * Load the configuration files (core-site.xml/hdfs-site.xml/...) from external locations instead of the classpath (classpath remains the default)
>  * Make the configuration loading extensible
>  * Make it in an backward-compatible way with minimal change in the existing Configuration.java
> *Use-cases:*
>  1.) load configuration from the namenode ([http://namenode:9878/conf]). With this approach only the namenode should be configured, other components require only the url of the namenode
>  2.) Read configuration directly from kubernetes config-map (or mesos)
>  3.) Read configuration from any external cluster management (such as Apache Ambari or any equivalent)
>  4.) as of now in the hadoop docker images we transform environment variables (such as HDFS-SITE.XML_fs.defaultFs) to configuration xml files with the help of a python script. With the proposed implementation it would be possible to read the configuration directly from the system environment variables.
> *Problem:*
> The existing Configuration.java can read configuration from multiple sources. But most of the time it's used to load predefined config names ("core-site.xml" and "hdfs-site.xml") without configuration location. In this case the files will be loaded from the classpath.
> I propose to add additional option to define the default location of core-site.xml and hdfs-site.xml (any configuration which is defined by string name) to use external sources in the classpath.
> The configuration loading requires implementation + configuration (where are the external configs). We can't use regular configuration to configure the config loader (chicken/egg).
> I propose to use a new environment variable HADOOP_CONF_SOURCE
> The environment variable could contain a URL, where the schema of the url can define the config source and all the other parts can configure the access to the resource.
> Examples:
> HADOOP_CONF_SOURCE=hadoop-[http://namenode:9878/conf]
> HADOOP_CONF_SOURCE=env://prefix
> HADOOP_CONF_SOURCE=k8s://config-map-name
> The ConfigurationSource interface can be as easy as:
> {code:java}
> /**
>  * Interface to load hadoop configuration from custom location.
>  */
> public interface ConfigurationSource {
>   /**
>    * Method will be called one with the defined configuration url.
>    *
>    * @param uri
>    */
>   void initialize(URI uri) throws IOException;
>   /**
>    * Method will be called to load a specific configuration resource.
>    *
>    * @param name of the configuration resource (eg. hdfs-site.xml)
>    * @return List of loaded configuraiton key and values.
>    */
>   List<ParsedItem> readConfiguration(String name);
> }{code}
> We can choose the right implementation based the schema of the uri and with Java Service Provider Interface mechanism (META-INF/services/org.apache.hadoop.conf.ConfigurationSource)
> It could be with minimal modification in the Configuration.java (see the attached patch as an example)
>  The patch contains two example implementation:
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/Env.java*
> This can load configuration from environment variables based on a naming convention (eg. HDFS-SITE.XML_hdfs.dfs.key=value)
> *hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/location/HadoopWeb.java*
>  This implementation can load the configuration from a /conf servlet of any Hadoop components.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org