You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Eric Sammer <er...@lifeless.net> on 2010/02/09 04:01:56 UTC
Static state in Configuration and elsewhere
All (notably commiters):
In walking through the source, it seems that there are a number of cases
where static state exists. o.a.h.conf.Configuration is a good example of
a case were there is a static cache (see Configuration.REGISTRY) as well
as some class level state for default resources. These static members
make some testing scenarios difficult and I think they confuse and
complicate certain issues such as object lifecycle management.
To that end, I started working on a patch to tease the registry
behavior, default resources, and other similar state and behavior into a
simple ConfigurationFactory that knows how to manufacture instances of
Configuration. For now, the Factory would act as a singleton but retains
a public constructor for simplifying testing and for what I hope is
easier integration with things like Spring down the road.
While working on this, I realized this might be too big of a departure
from the original design. Is the static state thing on purpose? Is it
interesting to the rest of the community to factor this out? Does it
bother / disrupt anyone else? Are these sorts of design related changes
welcome?
There are more classes than Configuration that exhibit this mixed
lifecycle sort of design, but I thought this was a "safe" place to start
as I see a way of maintaining backward compatibility for as a long as
necessary (although it's unpleasant, to be sure). At the same time, it
isn't entirely trivial given the pervasive nature of something like
Configuration so I don't want to get too deep if this sort of thing is
unwelcome.
Thanks in advance for comments, etc.
Regards.
--
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com
Re: Static state in Configuration and elsewhere
Posted by "E. Sammer" <er...@lifeless.net>.
On 2/11/10 2:41 PM, Aaron Kimball wrote:
> There are an enormous number of examples of the following line in user-side
> code:
>
> Configuration conf = new Configuration();
>
> ... This is going to need to still work transparently after any refactoring.
> The new Configuration in this case needs to be populated with values from
> the appropriate defaults files.
>
> Maybe instead of using a singleton ConfigurationFactory (e.g.
> ConfigurationFactory.getConfigFactory().newConfiguration()), you could
> instead have Configuration's constructor use a service locator to get a
> configuration "populator."
>
> e.g:
>
> class ConfigServiceLocator {
> static ConfigServiceLocator getServiceLocator(); /** this class is a
> singleton */
> ConfigService getConfigService()
> void setConfigService(ConfigService);
> }
>
> class ConfigService {
> getAllThatStuffThatWasPreviouslyStaticState()
> }
>
> then in Configuration#Configuration() {
>
> initWith(ConfigServiceLocator.getServiceLocator().getAllThatStuffThatWasPreviouslyStaticState());
> }
>
> Then there can be a static block which initializes the ConfigServiceLocator
> with a default ConfigService instance that does everything as normal. For
> testing, though, you could instantiate a new ConfigService, put whatever
> state you want in it, and then update the ServiceLocator to use this new
> instance instead.
>
> Does that sound like a helpful alternative?
> - Aaron
I definitely see what you're getting at here and it is an alternative.
My feeling is that there's still a weird static dependency that doesn't
do exactly what you (I) want. There was someone once who said you should
never modify the behavior of a constructor. A constructor should always
produce a simple new instance. If you want different behavior, make the
constructor private and use a static method to indicate non-standard
behavior. I tend to think this is a good take. By not embedding
static-dependent behavior you enable people to do more with dependency
management. In the above example, there's a hard coded dep on
ConfigServiceLocation within Configuration#Configuration with no point
of interjection which is ultimately what I'm trying to avoid.
I'm sure this will end with there being simply too much existing inertia
to change it, but it hurts us.
Now, all of that said, I think you could introduce the
ConfigurationFactory and still allow people to directly create
Configuration instances directly. The problem is that the Configuration
constructor has the overlay logic where as it should probably be at
least in a static method of Configuration (i.e. outside of the
constructor proper).
One could do a phased replacement by introducing the CFactory, pushing
code internally toward it, and eventually removing the layering within
the Configuration, instead having the CF handle it (which I think is a
better place for it), or at least opting for a static factory method
within Configuration so people may use their own Factories (or Spring,
for instance) externally.
Just jumping back for a second, I think the larger issue is the tendency
to use static state in the code base. This is indicative of a larger
pattern. I think that much of the code base can be simplified by having
simpler objects at the potential expense of having a few extra classes.
I wanted to judge the relative reaction to these kinds of changes. I'm
happy to submit patches even if they get rejected if it means people see
the type of problem I'm trying to get at.
Thanks for your comments, Aaron!
--
Eric Sammer
eric@lifeless.net
http://esammer.blogspot.com
Re: Static state in Configuration and elsewhere
Posted by Aaron Kimball <aa...@cloudera.com>.
There are an enormous number of examples of the following line in user-side
code:
Configuration conf = new Configuration();
... This is going to need to still work transparently after any refactoring.
The new Configuration in this case needs to be populated with values from
the appropriate defaults files.
Maybe instead of using a singleton ConfigurationFactory (e.g.
ConfigurationFactory.getConfigFactory().newConfiguration()), you could
instead have Configuration's constructor use a service locator to get a
configuration "populator."
e.g:
class ConfigServiceLocator {
static ConfigServiceLocator getServiceLocator(); /** this class is a
singleton */
ConfigService getConfigService()
void setConfigService(ConfigService);
}
class ConfigService {
getAllThatStuffThatWasPreviouslyStaticState()
}
then in Configuration#Configuration() {
initWith(ConfigServiceLocator.getServiceLocator().getAllThatStuffThatWasPreviouslyStaticState());
}
Then there can be a static block which initializes the ConfigServiceLocator
with a default ConfigService instance that does everything as normal. For
testing, though, you could instantiate a new ConfigService, put whatever
state you want in it, and then update the ServiceLocator to use this new
instance instead.
Does that sound like a helpful alternative?
- Aaron
On Wed, Feb 10, 2010 at 3:09 AM, Steve Loughran <st...@apache.org> wrote:
> Eric Sammer wrote:
>
>> All (notably commiters):
>>
>> In walking through the source, it seems that there are a number of cases
>> where static state exists. o.a.h.conf.Configuration is a good example of
>> a case were there is a static cache (see Configuration.REGISTRY) as well
>> as some class level state for default resources. These static members
>> make some testing scenarios difficult and I think they confuse and
>> complicate certain issues such as object lifecycle management.
>>
>> To that end, I started working on a patch to tease the registry
>> behavior, default resources, and other similar state and behavior into a
>> simple ConfigurationFactory that knows how to manufacture instances of
>> Configuration. For now, the Factory would act as a singleton but retains
>> a public constructor for simplifying testing and for what I hope is
>> easier integration with things like Spring down the road.
>>
>> While working on this, I realized this might be too big of a departure
>> from the original design. Is the static state thing on purpose? Is it
>> interesting to the rest of the community to factor this out? Does it
>> bother / disrupt anyone else? Are these sorts of design related changes
>> welcome?
>>
>
> I have looked at this in the past, a "what would it take to move to a
> config factory", and concluded it would be pretty tricky undertaking. There
> are lot of places that create new Configuration instances.
>
> That said, I would like, it, after I've got my lifecycle patches up to date
> and in.
>
> (I'd also like things like datanode and tasktracker to reread from config
> things like the namenode and JT addresses whenever they have to reconnect to
> the master nodes, but that's even more complex)
>
>
>
Re: Static state in Configuration and elsewhere
Posted by Steve Loughran <st...@apache.org>.
Eric Sammer wrote:
> All (notably commiters):
>
> In walking through the source, it seems that there are a number of cases
> where static state exists. o.a.h.conf.Configuration is a good example of
> a case were there is a static cache (see Configuration.REGISTRY) as well
> as some class level state for default resources. These static members
> make some testing scenarios difficult and I think they confuse and
> complicate certain issues such as object lifecycle management.
>
> To that end, I started working on a patch to tease the registry
> behavior, default resources, and other similar state and behavior into a
> simple ConfigurationFactory that knows how to manufacture instances of
> Configuration. For now, the Factory would act as a singleton but retains
> a public constructor for simplifying testing and for what I hope is
> easier integration with things like Spring down the road.
>
> While working on this, I realized this might be too big of a departure
> from the original design. Is the static state thing on purpose? Is it
> interesting to the rest of the community to factor this out? Does it
> bother / disrupt anyone else? Are these sorts of design related changes
> welcome?
I have looked at this in the past, a "what would it take to move to a
config factory", and concluded it would be pretty tricky undertaking.
There are lot of places that create new Configuration instances.
That said, I would like, it, after I've got my lifecycle patches up to
date and in.
(I'd also like things like datanode and tasktracker to reread from
config things like the namenode and JT addresses whenever they have to
reconnect to the master nodes, but that's even more complex)