You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2015/09/01 06:15:46 UTC

[jira] [Commented] (HADOOP-12118) Validate xml configuration files with XML Schema

    [ https://issues.apache.org/jira/browse/HADOOP-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724763#comment-14724763 ] 

Kengo Seki commented on HADOOP-12118:
-------------------------------------

Sorry [~gliptak] for the late response. I agree with the backport because it must be useful for the 2.x users. But I couldn't decide whether we should adopt xsd for validation or not.
One possibility is, using xsd for the basic structure validation and xpath (for example) for the advanced validations I mentioned above to avoid direct xml walking, like:

{code}
    XPath xpath = XPathFactory.newInstance().newXPath();
    NodeList nodes = (NodeList) xpath.evaluate("/configuration/property/name/text()",
        new InputSource("core-site.xml"), XPathConstants.NODESET);
    Set<String> s = new HashSet<String>();
    for (int i=0; i<nodes.getLength(); i++) {
      String name = nodes.item(i).getTextContent();
      if (!s.add(name)) {
        System.err.println("Found duplicated property: " + name);
      }
    }
{code} 

It will significantly improve code readability and maintainability, but I have one concern. Probably it can't report line numbers the problems occurred, because DOM doesn't keep elements' position. It is some kind of degradation, but fortunately (or unfortunately?) 3.0 is not released yet, it may be an acceptable deal for code simplicity at this point.

Thoughts?

> Validate xml configuration files with XML Schema
> ------------------------------------------------
>
>                 Key: HADOOP-12118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12118
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Christopher Tubbs
>         Attachments: HADOOP-7947.branch-2.1.patch, hadoop-configuration.xsd
>
>
> I spent an embarrassingly long time today trying to figure out why the following wouldn't work.
> {code}
> <property>
>   <key>fs.defaultFS</key>
>   <value>hdfs://localhost:9000</value>
> </property>
> {code}
> I just kept getting an error about no authority for {{fs.defaultFS}}, with a value of {{file:///}}, which made no sense... because I knew it was there.
> The problem was that the {{core-site.xml}} was parsed entirely without any validation. This seems incorrect. The very least that could be done is a simple XML Schema validation against an XSD, before parsing. That way, users will get immediate failures on common typos and other problems in the xml configuration files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)