You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Eric Sammer <er...@lifeless.net> on 2010/02/09 04:01:56 UTC

Static state in Configuration and elsewhere

All (notably commiters):

In walking through the source, it seems that there are a number of cases
where static state exists. o.a.h.conf.Configuration is a good example of
a case were there is a static cache (see Configuration.REGISTRY) as well
as some class level state for default resources. These static members
make some testing scenarios difficult and I think they confuse and
complicate certain issues such as object lifecycle management.

To that end, I started working on a patch to tease the registry
behavior, default resources, and other similar state and behavior into a
simple ConfigurationFactory that knows how to manufacture instances of
Configuration. For now, the Factory would act as a singleton but retains
a public constructor for simplifying testing and for what I hope is
easier integration with things like Spring down the road.

While working on this, I realized this might be too big of a departure
from the original design. Is the static state thing on purpose? Is it
interesting to the rest of the community to factor this out? Does it
bother / disrupt anyone else? Are these sorts of design related changes
welcome?

There are more classes than Configuration that exhibit this mixed
lifecycle sort of design, but I thought this was a "safe" place to start
as I see a way of maintaining backward compatibility for as a long as
necessary (although it's unpleasant, to be sure). At the same time, it
isn't entirely trivial given the pervasive nature of something like
Configuration so I don't want to get too deep if this sort of thing is
unwelcome.

Thanks in advance for comments, etc.
Regards.
-- 
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com

Re: Static state in Configuration and elsewhere

Posted by "E. Sammer" <er...@lifeless.net>.
On 2/11/10 2:41 PM, Aaron Kimball wrote:
> There are an enormous number of examples of the following line in user-side
> code:
>
> Configuration conf = new Configuration();
>
> ... This is going to need to still work transparently after any refactoring.
> The new Configuration in this case needs to be populated with values from
> the appropriate defaults files.
>
> Maybe instead of using a singleton ConfigurationFactory (e.g.
> ConfigurationFactory.getConfigFactory().newConfiguration()), you could
> instead have Configuration's constructor use a service locator to get a
> configuration "populator."
>
> e.g:
>
> class ConfigServiceLocator {
>    static ConfigServiceLocator getServiceLocator(); /** this class is a
> singleton */
>    ConfigService getConfigService()
>    void setConfigService(ConfigService);
> }
>
> class ConfigService {
>    getAllThatStuffThatWasPreviouslyStaticState()
> }
>
> then in Configuration#Configuration() {
>
> initWith(ConfigServiceLocator.getServiceLocator().getAllThatStuffThatWasPreviouslyStaticState());
> }
>
> Then there can be a static block which initializes the ConfigServiceLocator
> with a default ConfigService instance that does everything as normal. For
> testing, though, you could instantiate a new ConfigService, put whatever
> state you want in it, and then update the ServiceLocator to use this new
> instance instead.
>
> Does that sound like a helpful alternative?
> - Aaron

I definitely see what you're getting at here and it is an alternative. 
My feeling is that there's still a weird static dependency that doesn't 
do exactly what you (I) want. There was someone once who said you should 
never modify the behavior of a constructor. A constructor should always 
produce a simple new instance. If you want different behavior, make the 
constructor private and use a static method to indicate non-standard 
behavior. I tend to think this is a good take. By not embedding 
static-dependent behavior you enable people to do more with dependency 
management. In the above example, there's a hard coded dep on 
ConfigServiceLocation within Configuration#Configuration with no point 
of interjection which is ultimately what I'm trying to avoid.

I'm sure this will end with there being simply too much existing inertia 
to change it, but it hurts us.

Now, all of that said, I think you could introduce the 
ConfigurationFactory and still allow people to directly create 
Configuration instances directly. The problem is that the Configuration 
constructor has the overlay logic where as it should probably be at 
least in a static method of Configuration (i.e. outside of the 
constructor proper).

One could do a phased replacement by introducing the CFactory, pushing 
code internally toward it, and eventually removing the layering within 
the Configuration, instead having the CF handle it (which I think is a 
better place for it), or at least opting for a static factory method 
within Configuration so people may use their own Factories (or Spring, 
for instance) externally.

Just jumping back for a second, I think the larger issue is the tendency 
to use static state in the code base. This is indicative of a larger 
pattern. I think that much of the code base can be simplified by having 
simpler objects at the potential expense of having a few extra classes.

I wanted to judge the relative reaction to these kinds of changes. I'm 
happy to submit patches even if they get rejected if it means people see 
the type of problem I'm trying to get at.

Thanks for your comments, Aaron!
-- 
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com

Re: Static state in Configuration and elsewhere

Posted by Aaron Kimball <aa...@cloudera.com>.
There are an enormous number of examples of the following line in user-side
code:

Configuration conf = new Configuration();

... This is going to need to still work transparently after any refactoring.
The new Configuration in this case needs to be populated with values from
the appropriate defaults files.

Maybe instead of using a singleton ConfigurationFactory (e.g.
ConfigurationFactory.getConfigFactory().newConfiguration()), you could
instead have Configuration's constructor use a service locator to get a
configuration "populator."

e.g:

class ConfigServiceLocator {
  static ConfigServiceLocator getServiceLocator(); /** this class is a
singleton */
  ConfigService getConfigService()
  void setConfigService(ConfigService);
}

class ConfigService {
  getAllThatStuffThatWasPreviouslyStaticState()
}

then in Configuration#Configuration() {

initWith(ConfigServiceLocator.getServiceLocator().getAllThatStuffThatWasPreviouslyStaticState());
}

Then there can be a static block which initializes the ConfigServiceLocator
with a default ConfigService instance that does everything as normal. For
testing, though, you could instantiate a new ConfigService, put whatever
state you want in it, and then update the ServiceLocator to use this new
instance instead.

Does that sound like a helpful alternative?
- Aaron





On Wed, Feb 10, 2010 at 3:09 AM, Steve Loughran <st...@apache.org> wrote:

> Eric Sammer wrote:
>
>> All (notably commiters):
>>
>> In walking through the source, it seems that there are a number of cases
>> where static state exists. o.a.h.conf.Configuration is a good example of
>> a case were there is a static cache (see Configuration.REGISTRY) as well
>> as some class level state for default resources. These static members
>> make some testing scenarios difficult and I think they confuse and
>> complicate certain issues such as object lifecycle management.
>>
>> To that end, I started working on a patch to tease the registry
>> behavior, default resources, and other similar state and behavior into a
>> simple ConfigurationFactory that knows how to manufacture instances of
>> Configuration. For now, the Factory would act as a singleton but retains
>> a public constructor for simplifying testing and for what I hope is
>> easier integration with things like Spring down the road.
>>
>> While working on this, I realized this might be too big of a departure
>> from the original design. Is the static state thing on purpose? Is it
>> interesting to the rest of the community to factor this out? Does it
>> bother / disrupt anyone else? Are these sorts of design related changes
>> welcome?
>>
>
> I have looked at this in the past, a "what would it take to move to a
> config factory", and concluded it would be pretty tricky undertaking. There
> are lot of places that create new Configuration instances.
>
> That said, I would like, it, after I've got my lifecycle patches up to date
> and in.
>
> (I'd also like things like datanode and tasktracker to reread from config
> things like the namenode and JT addresses whenever they have to reconnect to
> the master nodes, but that's even more complex)
>
>
>

Re: Static state in Configuration and elsewhere

Posted by Steve Loughran <st...@apache.org>.
Eric Sammer wrote:
> All (notably commiters):
> 
> In walking through the source, it seems that there are a number of cases
> where static state exists. o.a.h.conf.Configuration is a good example of
> a case were there is a static cache (see Configuration.REGISTRY) as well
> as some class level state for default resources. These static members
> make some testing scenarios difficult and I think they confuse and
> complicate certain issues such as object lifecycle management.
> 
> To that end, I started working on a patch to tease the registry
> behavior, default resources, and other similar state and behavior into a
> simple ConfigurationFactory that knows how to manufacture instances of
> Configuration. For now, the Factory would act as a singleton but retains
> a public constructor for simplifying testing and for what I hope is
> easier integration with things like Spring down the road.
> 
> While working on this, I realized this might be too big of a departure
> from the original design. Is the static state thing on purpose? Is it
> interesting to the rest of the community to factor this out? Does it
> bother / disrupt anyone else? Are these sorts of design related changes
> welcome?

I have looked at this in the past, a "what would it take to move to a 
config factory", and concluded it would be pretty tricky undertaking. 
There are lot of places that create new Configuration instances.

That said, I would like, it, after I've got my lifecycle patches up to 
date and in.

(I'd also like things like datanode and tasktracker to reread from 
config things like the namenode and JT addresses whenever they have to 
reconnect to the master nodes, but that's even more complex)