You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2015/03/03 00:00:09 UTC

[jira] [Updated] (HADOOP-7947) Validate XMLs if a relevant tool is available, when using scripts

     [ https://issues.apache.org/jira/browse/HADOOP-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kengo Seki updated HADOOP-7947:
-------------------------------
    Attachment: HADOOP-7947.002.patch

Attaching revised patch.

At first patch, I reused o.a.h.conf.Configuration’s logic for validation, but DOM doesn’t keep element’s position.
So I tried SAX and XML schema for validation, but it was too strict for this purpose (e.g. sub-element of <configuration> can be anything, but we’d like to detect if it’s not a <property>).
Finally, I used StAX and wrote the validation logic from scratch.

This function detects the following cases:

* Not well-formed XML
* XML is well-formed, but
** top-level element is not <configuration>
** sub-element of <configuration> is not <property>
** same attributes and/or elements are duplicately defined in <property>
** <name> or <value> does not exist in <property>
** <name> is empty (empty <value> can be valid in some properties, so it isn’t detected)
** duplicated <property>s with the same <name> value exist

Execution examples:
{code}
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/capacity-scheduler.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/core-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hadoop-policy.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hdfs-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/httpfs-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-acls.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/yarn-site.xml: valid
OK
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
0
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ cat ~/bad-manners.xml 
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

  <non-property/>

  <property/>

  <property>
    <name/>
    <value/>
  </property>

  <property>
    <name>a</name>
    <value>b</value>
  </property>

  <property name="c">
    <name>c</name>
    <value>d</value>
  </property>

  <property>
    <name>e</name>
    <value>f</value>
  </property>

  <property>
    <name>e</name>
    <value>g</value>
  </property>

</configuration>
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest -conf etc/hadoop/core-site.xml -conf ~/bad-manners.xml 
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml: valid
/Users/sekikn/bad-manners.xml:
	Line 5: element not <property>
	Line 7: <property> has no <name>
	Line 7: <property> has no <value>
	Line 9: <property> has an empty <name>
	Line 19: <property> has duplicated <name>s
	Line 24, 29: duplicated <property>s for e
Invalid file exists
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
1
{code}

> Validate XMLs if a relevant tool is available, when using scripts
> -----------------------------------------------------------------
>
>                 Key: HADOOP-7947
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7947
>             Project: Hadoop Common
>          Issue Type: Wish
>          Components: scripts
>    Affects Versions: 2.7.0
>            Reporter: Harsh J
>            Assignee: Kengo Seki
>              Labels: newbie
>         Attachments: HADOOP-7947.001.patch, HADOOP-7947.002.patch
>
>
> Given that we are locked down to using only XML for configuration and most of the administrators need to manage it by themselves (unless a tool that manages for you is used), it would be good to also validate the provided config XML (*-site.xml) files with a tool like {{xmllint}} or maybe Xerces somehow, when running a command or (at least) when starting up daemons.
> We should use this only if a relevant tool is available, and optionally be silent if the env. requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)