You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2006/01/04 15:39:38 UTC

no static NutchConf

Hi,
to move forward in the direction of having a nutch gui, I would love  
to start removing the static access of NutchConf.
Based on experience first I would love to get a kind of general  
agreement and a 'go' before wasting to much time for an unaccented  
solution.

I suggest:

+ removing NutchConf.get().
+ in case a lower level object use only one, two but not more than 3  
parameters from the nutch configuration, we add this parameter to the  
constructor of this object.
(e.g. MapFile.Reader needs only the parameter INDEX_SKIP)
+ for higher level objects like fetcher tool- that need more than 3  
parameters for the lower level object -  we add a instance of  
NutchConf to the Constructor
+ for all dynamic used object that implements a specific interface  
(interface > no control over the object constructor) we use the  
Configurable interface to set the NutchConf in a inversion of control  
like style.
(e.g. Plugin Extension Implementations like Parser or Protocols)
+ PluginRegestry will not longer a singleton but will get an  
constructor with a NutchConf instance.
+ Getting a Extension, require also a NutchConf that is injected in  
case the Extension Object (e.g. a Parser) implements a Configurable  
interface.

Any comments, improvement suggestions, more use-cases?
I would love to do this job, can I get a go from the other developers?
 From my point of view NutchConf is actually a showblocker since a  
lot of people run in trouble integrating nutch in other projects,  
also my suggestions are require to write a nutch gui.

Stefan



Re: no static NutchConf

Posted by Thomas Jaeger <nu...@thjaeger.org>.
Doug Cutting wrote:
> Stefan Groschupf wrote:
> 
>>> I have two more ideas:
>>> 1) create NutchConf as interface (not class)
>>> 2) make it work as plugin
>>
>>
>> I like the idea to make the conf as a singleton and understand the 
>> need to be able to integrate nutch.
>> However I would love to do one first step and later on we can make 
>> this second step. I made the experience that if you change to much 
>> people do not accept your patch.
> 
> 
> +1
> 
> I don't see a big advantage in trying to make both of these changes at
> the same time.  And, when possible, small incremental changes are easier
> for the community to process.

I never thought to make these changes at once. These were just some
thoughts on how to improve the nutch configuration. I agree with Stefan
in this point.


Thomas



Re: no static NutchConf

Posted by Doug Cutting <cu...@nutch.org>.
Stefan Groschupf wrote:
>> I have two more ideas:
>> 1) create NutchConf as interface (not class)
>> 2) make it work as plugin
> 
> I like the idea to make the conf as a singleton and understand the  need 
> to be able to integrate nutch.
> However I would love to do one first step and later on we can make  this 
> second step. I made the experience that if you change to much  people do 
> not accept your patch.

+1

I don't see a big advantage in trying to make both of these changes at 
the same time.  And, when possible, small incremental changes are easier 
for the community to process.

Doug

Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
> I have two more ideas:
> 1) create NutchConf as interface (not class)
> 2) make it work as plugin

I like the idea to make the conf as a singleton and understand the  
need to be able to integrate nutch.
However I would love to do one first step and later on we can make  
this second step. I made the experience that if you change to much  
people do not accept your patch.
This is painful since you invest some days of work and in the end  
wast your time for the trash.
So lets add this to the jira as improvement suggestion and do this  
step after the actually change.

Stefan 

Re: no static NutchConf

Posted by Thomas Jaeger <nu...@thjaeger.org>.
Hi,

Stefan Groschupf wrote:
[...]
> Any comments, improvement suggestions, more use-cases?

I completely agree with you.

I have two more ideas:
1) create NutchConf as interface (not class)
2) make it work as plugin

1) If NutchConf is an interface, the NutchConf implementation can be
written with a hashmap in mind (like now) or with JMX or
commons-configuration.
2) There are only 4 required configuration options (plugin.excludes,
plugin.includes, plugin.folders, plugin.auto-activation) the plugin
registry needs to start up. If these options are provided by a bootstrap
configuration, configuration plugins will be possible.

If help is needed, i would like to implement a JMX implementation of
NutchConf (since i will need it myself;).


Regards,

Thomas

Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
> Yes. I thought that the call of the method setConf only set the  
> NutchConf. This is a philosophical question. All right, the  
> implementation class can also load/set the fields.

Some Map / Reduce Classes already use this mechanism. E.g. see  
CrawlDbReducer, there is a configure method. But here the  
JobConfigurable interface is used.

Stefan 

Re: no static NutchConf

Posted by Marko Bauhardt <mb...@media-style.com>.
Am 08.01.2006 um 16:08 schrieb Stefan Groschupf:

> Marko,
> as mentioned...
> All these classes will implement the NutchConfigurable interface.  
> The plugin system will instantiate these objects and inject the  
> nutch configuration object *BEFORE* it will return the object  
> instance to the caller object.
> So we can be sure that setConf is called before any e.g. parse  
> method is called.

Thats right.

> So the answer is the fields will be setted / intialized in the  
> setConf method that need to be implemented by each extension class  
> and we have the agreement that this method is called directly after  
> the constructor but before any other call.
> Does that clarify my suggestion?

Yes. I thought that the call of the method setConf only set the  
NutchConf. This is a philosophical question. All right, the  
implementation class can also load/set the fields.

Thanks, Marko


Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
Marko,
as mentioned...
All these classes will implement the NutchConfigurable interface. The  
plugin system will instantiate these objects and inject the nutch  
configuration object *BEFORE* it will return the object instance to  
the caller object.
So we can be sure that setConf is called before any e.g. parse method  
is called.
So the answer is the fields will be setted / intialized in the  
setConf method that need to be implemented by each extension class  
and we have the agreement that this method is called directly after  
the constructor but before any other call.
Does that clarify my suggestion?

Stefan



Am 08.01.2006 um 15:49 schrieb Marko Bauhardt:

>> + Getting a Extension, require also a NutchConf that is injected  
>> in case the Extension Object (e.g. a Parser) implements a  
>> Configurable interface.
>>
>
> I think this is a good idea. But many plugins like  
> BasicIndexingFilter or ExtParse require some fileds in the "parse"  
> or "filter" method. These fields are  load over the static way  
> (over static NutchConf or static blocks). And this is ok, because  
> the fields are load only one time. If we load the fields in the  
> "parse" or "filter" methods, the fields would be load many times.  
> And this is a performance problem.
> The initialization of the fields over the constructor does not  
> work, because setConf() is calling after the constructor.
>
> Should we add a method like "loadNutchConfiguration()" to the  
> NutchConfigurable interface, to load the NutchConfiguration  
> Parameter? Hm, i don't know.
> Should the fields are loading in the setConf() method? Hm, the name  
> of the method says: set the NutchConf and not load the required  
> NutchConfiguration-Parameter.
> Has anyone an other elegant solution?
>
> Marko
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



Re: no static NutchConf

Posted by Marko Bauhardt <mb...@media-style.com>.
> + Getting a Extension, require also a NutchConf that is injected in  
> case the Extension Object (e.g. a Parser) implements a Configurable  
> interface.
>

I think this is a good idea. But many plugins like  
BasicIndexingFilter or ExtParse require some fileds in the "parse" or  
"filter" method. These fields are  load over the static way (over  
static NutchConf or static blocks). And this is ok, because the  
fields are load only one time. If we load the fields in the "parse"  
or "filter" methods, the fields would be load many times. And this is  
a performance problem.
The initialization of the fields over the constructor does not work,  
because setConf() is calling after the constructor.

Should we add a method like "loadNutchConfiguration()" to the  
NutchConfigurable interface, to load the NutchConfiguration  
Parameter? Hm, i don't know.
Should the fields are loading in the setConf() method? Hm, the name  
of the method says: set the NutchConf and not load the required  
NutchConfiguration-Parameter.
Has anyone an other elegant solution?

Marko


Re: no static NutchConf

Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Groschupf wrote:

> Hi Andrzej,
> may be  I come closer to your idea of caching some objects.
>
>> Yes. If you remember our discussion, I'd like also to follow a  
>> pattern where such instances are cached inside this NutchConf  
>> instance, if appropriate (i.e. if they are reusable and multi- 
>> threaded).
>
>
> As mentioned I think it makes no sense to cache things like plugin  
> extension object, but what you think about caching the  
> PluginRepository that was already created with this specific  
> configuration instance.
> Of course we can not serialize this, but I guess this will improve  
> the performance somehow, since we do not need to scan the plugin  
> folder and time.


Yes, I agree on both accounts. :-)

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi Andrzej,
may be  I come closer to your idea of caching some objects.

> Yes. If you remember our discussion, I'd like also to follow a  
> pattern where such instances are cached inside this NutchConf  
> instance, if appropriate (i.e. if they are reusable and multi- 
> threaded).

As mentioned I think it makes no sense to cache things like plugin  
extension object, but what you think about caching the  
PluginRepository that was already created with this specific  
configuration instance.
Of course we can not serialize this, but I guess this will improve  
the performance somehow, since we do not need to scan the plugin  
folder and time.



Stefan 

Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
> But I like the direction and will not oppose against passing the  
> whole NutchConf in this case.
Ok than we will pass the NutchConf in the constructor.
It is a lot of work and will may take some time.


Re: no static NutchConf

Posted by Piotr Kosiorowski <pk...@gmail.com>.
+1 in general
In fact I like the approach presented by Stefan to pass only required 
parameters to objects that have small number of configurable params 
instead of NutchConf - it makes it obvious which parameters are required 
for such basic objects to run and as they are usually building blocks 
for something bigger it makes it easier to reuse it with different 
params in different parts of the code. But I like the direction and will 
not oppose against passing the whole NutchConf in this case.
Regards
Piotr

Re: no static NutchConf

Posted by Jérôme Charron <je...@gmail.com>.
> Another use case for eliminating the static uses of NutchConf is to
> simplify the construction of a configuration gui.  It would be nice to
> have a web-based interface which permits one to configure parameters and
> then have it run the system.

Yes, it is a really needed feature.


>   This should be able to run multiple Nutch
> instances in a single JVM.  For example, a single Nutch-based "search
> appliance" daemon should be able to crawl and search both your intranet
> and your public websites, each configured separately.

Ok, but why not using two JVM in such a case?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: no static NutchConf

Posted by ilango gurusamy <il...@yahoo.com>.
Stefan
  I would like to help you to do your project on the Nutch-based search  appliance deamon. The reason is: I want to have experience and learn  stuff. I started playing around with Nutch. I wrote a scraper in perl  and now I am trying to run one of the sample plugins too
  
  ilango

Stefan Groschupf <sg...@media-style.com> wrote:  > Another use case for eliminating the static uses of NutchConf is to  
> simplify the construction of a configuration gui.  It would be nice  
> to have a web-based interface which permits one to configure  
> parameters and then have it run the system.  This should be able to  
> run multiple Nutch instances in a single JVM.  For example, a  
> single Nutch-based "search appliance" daemon should be able to  
> crawl and search both your intranet and your public websites, each  
> configured separately.

Well this is my long term goal, I have to do that for my project in  
any case. :-)

Stefan




			
---------------------------------
Yahoo! Photos
 Ring in the New Year with Photo Calendars. Add photos, events, holidays, whatever.

Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
> Another use case for eliminating the static uses of NutchConf is to  
> simplify the construction of a configuration gui.  It would be nice  
> to have a web-based interface which permits one to configure  
> parameters and then have it run the system.  This should be able to  
> run multiple Nutch instances in a single JVM.  For example, a  
> single Nutch-based "search appliance" daemon should be able to  
> crawl and search both your intranet and your public websites, each  
> configured separately.

Well this is my long term goal, I have to do that for my project in  
any case. :-)

Stefan


Re: no static NutchConf

Posted by Doug Cutting <cu...@nutch.org>.
Andrzej Bialecki wrote:
> Example: what happens now if you try to run more than one fetcher at the 
> same time, where the fetcher parameters differ (or a set of activated 
> plugins differs)? You can't - the local tasks on each tasktracker will 
> use whatever local config is there.

That's true when mapred.job.tracker=local, but when things are 
distributed the config can vary since each task is spawned in a separate 
JVM with a separate classpath.  The nutch-site.xml on each node can 
never be overidden.  For example, so long as plugin.includes is not 
specified in nutch-site.xml on each node, then each task can override 
plugin.includes to use different plugins.

Also note that plugin implementations can submitted in a jar file with 
the job, and plugin.folders can be overridden in the job to find the new 
plugins.  So a job jar might include a folder named "my.plugins" and set 
plugin.folders to "my.plugins, plugins", then alter plugin.includes to 
include job-specific plugins.

> What happens if you change the 
> config on a node that  submits the job? The changes won't be propagated 
> to the tasktracker nodes, because tasktrackers use local configuration 
> (through a singleton NutchConf.get()), instead of supplying a 
> serialized/deserialized instance of the config from the originating 
> node... etc.

Again, I'm not sure this is a problem.  Properties which tasks should be 
able to override should not be specified in nutch-site.xml, but rather 
in mapred-default.xml.  Lots of job-specific properties are currently 
passed this way.

Another use case for eliminating the static uses of NutchConf is to 
simplify the construction of a configuration gui.  It would be nice to 
have a web-based interface which permits one to configure parameters and 
then have it run the system.  This should be able to run multiple Nutch 
instances in a single JVM.  For example, a single Nutch-based "search 
appliance" daemon should be able to crawl and search both your intranet 
and your public websites, each configured separately.

Doug

Re: no static NutchConf

Posted by Andrzej Bialecki <ab...@getopt.org>.
Jérôme Charron wrote:

>>>Excuse me in advance, I probably missed something, but what are the use
>>>cases for having many NutchConf instances with different values?
>>>      
>>>
>>Running many different tasks in parallel, each using different config,
>>inside the same JVM.
>>    
>>
>
>Ok, I understand this Andrzej, but it is not really what I call a use case.
>It is more a feature that you describe here.
>In fact, what I mean is that I don't understand in which cases it will be
>usefull. And I don't understand how a particular
>NutchConfig will be selected for a particular task...
>  
>

Use case: executing multiple tasks on any single tasktracker node, but 
with drastically different configurations per each task.

Example: what happens now if you try to run more than one fetcher at the 
same time, where the fetcher parameters differ (or a set of activated 
plugins differs)? You can't - the local tasks on each tasktracker will 
use whatever local config is there. What happens if you change the 
config on a node that  submits the job? The changes won't be propagated 
to the tasktracker nodes, because tasktrackers use local configuration 
(through a singleton NutchConf.get()), instead of supplying a 
serialized/deserialized instance of the config from the originating 
node... etc.

NutchConf instances will be created when you create a JobConf. Then they 
will have to be serialized/deserialized when job descriptors are sent by 
jobtracker to tasktrackers on mapred nodes, and used locally by 
tasktrackers to instantiate local tasks using copies of the original 
NutchConf instance.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: no static NutchConf

Posted by Jérôme Charron <je...@gmail.com>.
> >Excuse me in advance, I probably missed something, but what are the use
> >cases for having many NutchConf instances with different values?
> Running many different tasks in parallel, each using different config,
> inside the same JVM.

Ok, I understand this Andrzej, but it is not really what I call a use case.
It is more a feature that you describe here.
In fact, what I mean is that I don't understand in which cases it will be
usefull. And I don't understand how a particular
NutchConfig will be selected for a particular task...

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: no static NutchConf

Posted by Andrzej Bialecki <ab...@getopt.org>.
Jérôme Charron wrote:

>Excuse me in advance, I probably missed something, but what are the use
>cases for having many NutchConf instances with different values?
>  
>

Running many different tasks in parallel, each using different config, 
inside the same JVM.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: no static NutchConf

Posted by Jérôme Charron <je...@gmail.com>.
> My  idea is to be able using low level things outside of nutch also.
> It is may a philosophically question in case of the map file writer
> you pass a complete hashmap with a bunch of properties to the object,
> but the objects only reads one int from this hashmap. I personal
> don't like to use a hashmap to 'transport' just one value.

Yes Stefan, but passing only the NutchConf in the constructor
1. avoid breaking compatibility if a new parameter is used in a future
version of the constructor.
2. Give control of default values to the class itself instead of the calling
object.
I think that we can accept the general convention that all NutchConfigurable
objects must provide a constructor with a single NutchConf parameter.

Excuse me in advance, I probably missed something, but what are the use
cases for having many NutchConf instances with different values?

Regards

Jérôme

Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
Hey Steve,

> Eclipse has a very good pattern for handling configuration for each  
> of the
> components. Basically each component is responsible for its own
> configuration, and the tool just provides the framework to allow the
> configuration to be displayed, updated, and stored.


I know the eclipse configuration mechanism, we have different case  
with nutch.
The eclipse mechanism does not allow to run two eclipse in the same  
jvm, sure for eclipse that makes no sense, but for nutch it does very  
much (e.g. have a search engine for different parts of a corporate  
intranet on one box).
The eclipse mechanism is a kind of singleton configuration and each  
component (eclipse plugin, load what it is interested in) for nutch  
we need to pass the configuration properties down the call stack, to  
be able running 2 fetchers with different configurations and having 2  
instances of the same parser plugin but with different configuration  
values.


Stefan


  

RE: no static NutchConf

Posted by Steve Betts <sb...@minethurn.com>.
If you are going to be able to reconfigure a nutch component at runtime, you
need to remove any configuration from the constructor and have a method that
allows you to get/set the configuration for the component. The problem with
keeping the entire configuration in a single component is trying to
display/filter the configuration information for the user. So the user knows
what component it is configuring.

Eclipse has a very good pattern for handling configuration for each of the
components. Basically each component is responsible for its own
configuration, and the tool just provides the framework to allow the
configuration to be displayed, updated, and stored.

The drawback of that approach is that you really don't have a GUI, or at
least have to be able to run without one.

I think that, at the very least, removing the configuration information from
the constructor is the first step.  You can still have a properties object
set the configuration. Then we can discuss the relative merits of
displaying, changing, and storing the configuration.  (Like, how a user is
supposed to know what component is affected by which property.)

Thanks,

Steve Betts
sbetts@minethurn.com
937-477-1797


-----Original Message-----
From: Stefan Groschupf [mailto:sg@media-style.com]
Sent: Wednesday, January 04, 2006 12:22 PM
To: nutch-dev@lucene.apache.org
Subject: Re: no static NutchConf

>
> I don't fully agree with this. In most such cases, you already have
> a NutchConf instance in the method or class context, so it makes
> sense to use it in the constructor. You could add these construtors
> with all parameters iterated, but I'd expect that the constructors
> using NutchConf would be used most frequently.

My  idea is to be able using low level things outside of nutch also.
It is may a philosophically question in case of the map file writer
you pass a complete hashmap with a bunch of properties to the object,
but the objects only reads one int from this hashmap. I personal
don't like to use a hashmap to 'transport' just one value.

So my suggestion looks like:
new MapFile.Reader(parameterA, nutchConf.getInt("parameterKey", 0));
if I understand you correct you prefer:
new MapFile.Reader(parameterA, nutchConf);
...
public MapFile(...){
	this.parameter = nutchConf.getInt("parameterKey",0);
}

As mentioned this is more a code philosophy question and this is not
important for me, my only idea was to decouple things as much as
possible if we touch it anyway.

>> + Getting a Extension, require also a NutchConf that is injected
>> in  case the Extension Object (e.g. a Parser) implements a
>> Configurable  interface.
>
>
> Yes. If you remember our discussion, I'd like also to follow a
> pattern where such instances are cached inside this NutchConf
> instance, if appropriate (i.e. if they are reusable and multi-
> threaded).


I'm afraid I still do not clearly understand your idea here. As
discussed it makes from my point of view no sense to cache any
objects in a nutchConf.
Especially extension implementation like parsers are multithreaded
and exists that often as we have threads. A caching would make more
sense behind the sense of the plugin registry, but it is may
difficult since you can run in trouble with resource life cycle
management. PluginClass instances are already cached and working like
a kind of singleton for each existing plugin registry.
Also I see some trouble  when using this caching mechanism since
NutchConf can be serialized. Actually I have no idea where this
mechanism is used, but I guess distributed map reduce will use this
mechanism heavily.
So the cached objects need to be Serializable as well.

Stefan



Re: no static NutchConf

Posted by Stefan Groschupf <sg...@media-style.com>.
>
> I don't fully agree with this. In most such cases, you already have  
> a NutchConf instance in the method or class context, so it makes  
> sense to use it in the constructor. You could add these construtors  
> with all parameters iterated, but I'd expect that the constructors  
> using NutchConf would be used most frequently.

My  idea is to be able using low level things outside of nutch also.  
It is may a philosophically question in case of the map file writer  
you pass a complete hashmap with a bunch of properties to the object,  
but the objects only reads one int from this hashmap. I personal  
don't like to use a hashmap to 'transport' just one value.

So my suggestion looks like:
new MapFile.Reader(parameterA, nutchConf.getInt("parameterKey", 0));
if I understand you correct you prefer:
new MapFile.Reader(parameterA, nutchConf);
...
public MapFile(...){
	this.parameter = nutchConf.getInt("parameterKey",0);
}

As mentioned this is more a code philosophy question and this is not  
important for me, my only idea was to decouple things as much as  
possible if we touch it anyway.

>> + Getting a Extension, require also a NutchConf that is injected  
>> in  case the Extension Object (e.g. a Parser) implements a  
>> Configurable  interface.
>
>
> Yes. If you remember our discussion, I'd like also to follow a  
> pattern where such instances are cached inside this NutchConf  
> instance, if appropriate (i.e. if they are reusable and multi- 
> threaded).


I'm afraid I still do not clearly understand your idea here. As  
discussed it makes from my point of view no sense to cache any  
objects in a nutchConf.
Especially extension implementation like parsers are multithreaded  
and exists that often as we have threads. A caching would make more  
sense behind the sense of the plugin registry, but it is may  
difficult since you can run in trouble with resource life cycle  
management. PluginClass instances are already cached and working like  
a kind of singleton for each existing plugin registry.
Also I see some trouble  when using this caching mechanism since  
NutchConf can be serialized. Actually I have no idea where this  
mechanism is used, but I guess distributed map reduce will use this  
mechanism heavily.
So the cached objects need to be Serializable as well.

Stefan


Re: no static NutchConf

Posted by Andrzej Bialecki <ab...@getopt.org>.
Stefan Groschupf wrote:

> Hi,
> to move forward in the direction of having a nutch gui, I would love  
> to start removing the static access of NutchConf.
> Based on experience first I would love to get a kind of general  
> agreement and a 'go' before wasting to much time for an unaccented  
> solution.


I agree with the general direction. Some comments below:

>
> I suggest:
>
> + removing NutchConf.get().


I'm not sure about this... Somewhere you need to instantiate the default 
config, and this looks like a good place.

> + in case a lower level object use only one, two but not more than 3  
> parameters from the nutch configuration, we add this parameter to the  
> constructor of this object.
> (e.g. MapFile.Reader needs only the parameter INDEX_SKIP)


I don't fully agree with this. In most such cases, you already have a 
NutchConf instance in the method or class context, so it makes sense to 
use it in the constructor. You could add these construtors with all 
parameters iterated, but I'd expect that the constructors using 
NutchConf would be used most frequently.

> + for higher level objects like fetcher tool- that need more than 3  
> parameters for the lower level object -  we add a instance of  
> NutchConf to the Constructor


Ok.

> + for all dynamic used object that implements a specific interface  
> (interface > no control over the object constructor) we use the  
> Configurable interface to set the NutchConf in a inversion of control  
> like style.
> (e.g. Plugin Extension Implementations like Parser or Protocols)


Ok.

> + PluginRegestry will not longer a singleton but will get an  
> constructor with a NutchConf instance.


Definitely yes.

> + Getting a Extension, require also a NutchConf that is injected in  
> case the Extension Object (e.g. a Parser) implements a Configurable  
> interface.


Yes. If you remember our discussion, I'd like also to follow a pattern 
where such instances are cached inside this NutchConf instance, if 
appropriate (i.e. if they are reusable and multi-threaded).

>
> Any comments, improvement suggestions, more use-cases?
> I would love to do this job, can I get a go from the other developers?


+1 from me.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com