You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ari Rabkin (JIRA)" <ji...@apache.org> on 2010/11/15 23:22:15 UTC
[jira] Created: (HADOOP-7036) spellcheck for configuration
spellcheck for configuration
----------------------------
Key: HADOOP-7036
URL: https://issues.apache.org/jira/browse/HADOOP-7036
Project: Hadoop Common
Issue Type: New Feature
Components: conf
Reporter: Ari Rabkin
Assignee: Ari Rabkin
Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
The system works as follows:
- Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
- Distribute these extracted sets, per version.
- A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ari Rabkin updated HADOOP-7036:
-------------------------------
Attachment: confspellcheck.jar
Jar for spellcheck. Should go in contrib/spellcheck directory.
Source code available from http://code.google.com/p/jchord/source/browse/#svn/trunk/conf_spellchecker/
Available under BSD license
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932245#action_12932245 ]
Ari Rabkin commented on HADOOP-7036:
------------------------------------
Actually, as I think a bit more, XML schema isn't quite as powerful as this approach. The spellchecker tool is able to check constraints like "this option must be a writable local file", which don't fit into a schema.
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12964916#action_12964916 ]
Ari Rabkin commented on HADOOP-7036:
------------------------------------
The jar is now in Maven. I'll revise the patch to pull it in that way.
The tool is intended for ops folks to some extent, and novice users to an even greater extent. I'm not sure which way that pushes the packaging question. As I understand, this is different in the v20 branch and the v21 branch. Is there a document somewhere summarizing what goes in contrib for each Hadoop branch?
I don't understand the right way to divide stuff up amongst projects. Seems like a hassle to have the Mapred dictionary in one patch against one project, the HDFS dictionary in another, and the common options and the invoke script in a third. Can that really be the right way to go?
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971995#action_12971995 ]
Ari Rabkin commented on HADOOP-7036:
------------------------------------
The code is already released under Apache license (google code, not github)
The reason I want it integrated with Hadoop is that it's a tool primarily designed to help novice users, who are very unlikely to go off and install some small little component that they've never heard of. Almost all the value inheres in being "on by default."
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12933141#action_12933141 ]
Konstantin Boudnik commented on HADOOP-7036:
--------------------------------------------
Good idea, I like it. However the jar file needs to be included via Ivy dependency declaration - not directly to the SVN.
Also, it seems that it needs to be split between the project. E.g. Common shouldn't know anything about HDFS or MR specific configuration options.
One more nit: the tool sounds more like a nice addition to Ops (cluster operation) folks or whoever else needs to create their own configurations. Perhaps it belongs to HDFS/MR contribs rathen than Common.
+1 on the idea, though!
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932260#action_12932260 ]
Ari Rabkin commented on HADOOP-7036:
------------------------------------
It is already a separate project, on Google Code; it's a subcomponent of http://code.google.com/p/jchord/
But I thought it made sense to include the Hadoop-specific scripts and the [hopefully human-checked] Hadoop dictionary files in Hadoop contrib. This improves visibility and also benefits the Hadoop community by helping users avoid what I gather is a significant problem -- mis-spelled option names.
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971963#action_12971963 ]
Konstantin Boudnik commented on HADOOP-7036:
--------------------------------------------
I'd rather agree with you about the overhead. Unfortunately, this is how things have became after 3-way split we have experienced a couple of years ago. Perhaps, Eli point make sense and it would be a good idea to put it as a separate project under Apache license with its own artifact to github?
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971795#action_12971795 ]
Ari Rabkin commented on HADOOP-7036:
------------------------------------
I've been overloaded with other things and am only now getting back to this. I'm still unsure what the right way to package this is.
- The jar is now on maven.
- I have separate HDFS and MapReduce dictionary files.
Where should I put the script that launches the thing? Separate scripts for MapReduce and HDFS? That seems very wasteful.
Do I need to open a pair of new JIRAs, one each for MAPREDUCE and HDFS?
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932236#action_12932236 ]
Allen Wittenauer commented on HADOOP-7036:
------------------------------------------
Errr, don't we just need a schema definition and we can do this with any number of xml tools?
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ari Rabkin updated HADOOP-7036:
-------------------------------
Attachment: hadoopSpellcheck.patch
Includes dictionary files for (and was tested with) 0.20.2 and 0.21.0
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Ari Rabkin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932244#action_12932244 ]
Ari Rabkin commented on HADOOP-7036:
------------------------------------
It would be possible to use XML schema to do the enforcement. I opted for this strategy so I could reuse the spellcheck component for other systems that use non-XML key-value configuration.
The hard part here isn't the enforcement per se, it's automatically extracting the schema and keeping it up to date for each version. That's the real contribution here; I'm undertaking to keep those up to date, using program analysis.
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-7036) spellcheck for configuration
Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932256#action_12932256 ]
Eli Collins commented on HADOOP-7036:
-------------------------------------
Sounds like a good tool. Maybe better as a separate project eg on github than part of core Hadoop?
> spellcheck for configuration
> ----------------------------
>
> Key: HADOOP-7036
> URL: https://issues.apache.org/jira/browse/HADOOP-7036
> Project: Hadoop Common
> Issue Type: New Feature
> Components: conf
> Reporter: Ari Rabkin
> Assignee: Ari Rabkin
> Attachments: confspellcheck.jar, hadoopSpellcheck.patch
>
>
> Hadoop does fairly limited correctness checks of its configuration. I propose a "configuration spellcheck" that can automatically catch errors, and particularly can catch cases where users mis-type the name of an option.
> The system works as follows:
> - Use program analysis to extract the set of options supported by each Hadoop version, annotated when possible with their types into a 'dictionary file'.
> - Distribute these extracted sets, per version.
> - A script that reads a dictionary file, reads the Hadoop config from a specified directory, and reports deviations. In particular, the system can report when an option is set that Hadoop will never read or when an invalid value is specified.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.