You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Philip Zeyliger (JIRA)" <ji...@apache.org> on 2009/07/03 23:22:53 UTC

[jira] Commented: (HADOOP-6105) Provide a way to automatically handle backward compatibility of deprecated keys

    [ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727143#action_12727143 ] 

Philip Zeyliger commented on HADOOP-6105:
-----------------------------------------

I'm not enamored of this approach and would like to propose
a slightly heavier-weight, but, I think, cleaner approach
than stuffing more logic into the Configuration class.
My apologies for coming to this conversation a bit late.

If you don't want to read a long e-mail, skip down to the code examples
at the bottom. :)

Before I get to the proposal, I wanted to lay out what I think the goals
are.  Note that HADOOP-475 is also related.

* Standardization of configuration names, documentation, and 
value formats.  Today, the names tend to appear in the code, or, at best,
in constants in the code, and the documentation, when it exists,
may be in -default.xml.  It would be nice if it was very difficult
to avoid writing documentation for the variable you're introducing.
Right now there are and have been a handful of bugs where the default
in the code is different than the default in the XML file, and
that gets really confusing.

* Backwards compatibility.  We'd love to rename "mapred.foo" and "mr.bar"
to be consistent, but we want to maintain backwards compatibility.
This ticket is all about that.

* Availability to user code.  Users should be able to use configuration the same way the core does.
Users pass information to their jobs via Configuration, and they should
use the same mechanism.  This is true today.

* Type-safety. Configurations have a handful of recurring types: number of bytes,
filename, URI, hostport combination, arrays of paths, etc.  The parsing
is done in an ad-hoc fashion, which is a shame, since it doesn't have to be.
It would be nice to have some generic runtime checking of configuration
parameters, too, and perhaps even ranges (that number can't be negative!).

* Upgradeability to a different configuration format.  I don't think we'll
leave a place where configuration has to be a key->value map (especially
because of "availability to user code", but it would eventually be nice
if configuration could be queried from other places, or if the
values could have a bit more structure.  (For example, we could use XML
to separate out a list of paths, instead of blindly using comma-delimited,
unescaped text.)

* Development ease.  It ought to be easier to find the places where configuration
is used.  Today the best we can do is a grep, and then follow references
manually.

* Autogenerated documentation.  No-brainer.

* Ability to specify visibility, scope, and stability.  Alogn the lines of HADOOP-5073, configuration
variables should be classified as deprecated, unstable, evolving, and stable.  It would be
nice to introduce variables (say, that were used for tuning), with the expectation that they are
not part of the public API.  Use at your own risk sort of thing.

My proposal is to represent every configuration variable that's accessed in
the Hadoop code by a static instance of a ConfigVariable<T> class.  The interface
is something like:

{code}
public interface ConfigValue<T> {
  T get(Configuration conf);
  T getDefault();
  void set(Configuration conf, T value);
  String getHelp();
}
{code}

There's more than one way to implement this.  Here's one proposal that uses
Java annotations:

{code}
  @ConfigDescription(help="Some help text", 
      visibility=Visibility.PUBLIC)
  @ConfigAccessors({
    @ConfigAccessor(name="common.sample"),
    @ConfigAccessor(name="core.sample", deprecated="Use common.sample instead")
  })
  public final static ConfigVariable<Integer> myConfigVariable = 
    ConfigVariables.newIntConfigVariable(15 /* default value */);
{code}
This approach would require pre-processing (at build time) the annotations
into a data file, and then, at runtime, querying this data file.
(It's not easily possible to get at the annotations on the
field from within myConfigVariable.)

I'm half-way to getting this working, and I actually think something
like the following would be better:
{code}
  @ConfigVariableDeclaration
  public final static ConfigVariable<URI> fsDefaultName = 
    ConfigVariableBuilder.newURI()
      .setDefault(null)
      .setHelp("Default filesystem")
      .setVisibility(Visibility.PUBLIC)
      .addAccessor("fs.default.name")
      .addDeprecatedAccessor("core.default.fs", "Use foo instead")
      .addValidator(new ValidateSupportedFilesystem());
{code}
This would still require build-time preprocessing (javac supports
this) to find the variables, instantiate them, and output
the documentation, but the rest of the processing is easy
at runtime.

A drawback of this approach is how to handle the defaults that
default to other variables.  Perhaps the easiest thing to do 
is to handle the same syntax we support now, like 'addIndirectDefault("${default.dir}/mapred")',
but something that references the other variable directly is more appealing, e.g.: 'addIndirectDefault(OtherClass.class, "fieldname")'.

I think this can be implemented relatively quickly, with little impact on
breaking stuff (because the old way of using Configuration continues to work).

What do you think?


> Provide a way to automatically handle backward compatibility of deprecated keys
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-6105
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6105
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>            Reporter: Hemanth Yamijala
>
> There are cases when we have had to deprecate configuration keys. Use cases include, changing the names of variables to better match intent, splitting a single parameter into two - for maps, reduces etc.
> In such cases, we typically provide a backwards compatible option for the old keys. The handling of such cases might typically be common enough to actually add support for it in a generic fashion in the Configuration class. Some initial discussion around this started in HADOOP-5919, but since the project split happened in between we decided to open this issue to fix it in common.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.