You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Henning Blohm <he...@zfabrik.de> on 2010/10/14 10:34:05 UTC

Problem with org.apache.hadoop.conf.Configuration.REGISTRY

This is a follow-up on a HBase mailing list discussion:

http://mail-archives.apache.org/mod_mbox/hbase-user/201010.mbox/%
3C1286978976.4712.30.camel@expat%3E

When reusing Configuration that has an added addResource(InputStream) a
reload of configuration will fail as the stream
has been read before.

The reload gets triggered when addDefaultResource is called. That method
uses the REGISTRY static WeakHashMap to
reach out to all reachable Configuration instances to reset their
properties. 

The method addDefaultResource is called by e.g. ConfigUtil in
org.apache.hadoop.mapreduce.util (hadoop trunk) or JobConf (hadoop
0.20.2).

The problem has been observed in Hadoop 0.20.2 but the code in trunk has
essentially the same structure.

There are a few problems here:

1. You cannot safely use addResource(InputStream), if Configuration
objects are to be re-used (you can however use addResource(URL) instead)

2. Modifying the state of Configuration instances at some later point in
time as a side effect of some class initialization in some completely
unrelated thread 
leads to unpredictable behavior (properties change under the hood)

3. Configuration instances keep context classloaders to find resources.
After redeployment these may not be "valid" anymore. As long as the
Configuration instance has not been collected, 
addDefaultResource will still invoke reloadConfiguration on them. While
that is harmless today (only resetting members), this looks like a
ticking time bomb.

Suggestion: 

Define all default resources in Configuration once. Do not hold on to
other configuration instances and do not modify their state as a side
effect of some other activity.

Thanks,
  Henning