You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kengo Seki (JIRA)" <ji...@apache.org> on 2015/03/03 00:00:09 UTC
[jira] [Updated] (HADOOP-7947) Validate XMLs if a relevant tool is
available, when using scripts
[ https://issues.apache.org/jira/browse/HADOOP-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kengo Seki updated HADOOP-7947:
-------------------------------
Attachment: HADOOP-7947.002.patch
Attaching revised patch.
At first patch, I reused o.a.h.conf.Configuration’s logic for validation, but DOM doesn’t keep element’s position.
So I tried SAX and XML schema for validation, but it was too strict for this purpose (e.g. sub-element of <configuration> can be anything, but we’d like to detect if it’s not a <property>).
Finally, I used StAX and wrote the validation logic from scratch.
This function detects the following cases:
* Not well-formed XML
* XML is well-formed, but
** top-level element is not <configuration>
** sub-element of <configuration> is not <property>
** same attributes and/or elements are duplicately defined in <property>
** <name> or <value> does not exist in <property>
** <name> is empty (empty <value> can be valid in some properties, so it isn’t detected)
** duplicated <property>s with the same <name> value exist
Execution examples:
{code}
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/capacity-scheduler.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/core-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hadoop-policy.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/hdfs-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/httpfs-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-acls.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/kms-site.xml: valid
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/./etc/hadoop/yarn-site.xml: valid
OK
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
0
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ cat ~/bad-manners.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<non-property/>
<property/>
<property>
<name/>
<value/>
</property>
<property>
<name>a</name>
<value>b</value>
</property>
<property name="c">
<name>c</name>
<value>d</value>
</property>
<property>
<name>e</name>
<value>f</value>
</property>
<property>
<name>e</name>
<value>g</value>
</property>
</configuration>
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ env HADOOP_PREFIX=. bin/hadoop conftest -conf etc/hadoop/core-site.xml -conf ~/bad-manners.xml
/Users/sekikn/hadoop/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml: valid
/Users/sekikn/bad-manners.xml:
Line 5: element not <property>
Line 7: <property> has no <name>
Line 7: <property> has no <value>
Line 9: <property> has an empty <name>
Line 19: <property> has duplicated <name>s
Line 24, 29: duplicated <property>s for e
Invalid file exists
[sekikn@mobile hadoop-3.0.0-SNAPSHOT]$ echo $?
1
{code}
> Validate XMLs if a relevant tool is available, when using scripts
> -----------------------------------------------------------------
>
> Key: HADOOP-7947
> URL: https://issues.apache.org/jira/browse/HADOOP-7947
> Project: Hadoop Common
> Issue Type: Wish
> Components: scripts
> Affects Versions: 2.7.0
> Reporter: Harsh J
> Assignee: Kengo Seki
> Labels: newbie
> Attachments: HADOOP-7947.001.patch, HADOOP-7947.002.patch
>
>
> Given that we are locked down to using only XML for configuration and most of the administrators need to manage it by themselves (unless a tool that manages for you is used), it would be good to also validate the provided config XML (*-site.xml) files with a tool like {{xmllint}} or maybe Xerces somehow, when running a command or (at least) when starting up daemons.
> We should use this only if a relevant tool is available, and optionally be silent if the env. requests.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)