You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jérôme Charron <je...@gmail.com> on 2006/03/28 23:38:32 UTC

Refactoring some plugins

Hi,

Since the javadoc groups are filtered by packages, I suggest to refactor
some plugins so that their package are not the same as their extension
point package and they could be listed in the Javadoc plugins group instead
of core.

The changes I suggest are:
1. Change package of urlfilter-automaton from org.apache.nutch.net to
org.apache.nutch.urlfilter.automaton
2. Change package of urlfilter-prefix from org.apache.nutch.net to
org.apache.nutch.urlfilter.prefix
3. Change package of urlfilter-regex from org.apache.nutch.net to
org.apache.nutch.urlfilter.regex
4. Change package of lib-regex-filter from org.apache.nutch.net to
org.apache.nutch.urlfilter.regex.api
5. Change package of ontology from org.apache.nutch.ontology to
org.apache.nutch.ontology.jena

Yes, I know, changing packaging for some javadoc reasons is quite an "upside
down world" (sorry for this litteral
translation from french), but it is the only way to do it since javdoc
groups are filtered by packages.

Does it make sense? Do you have any suggestions or comments?

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: Refactoring some plugins

Posted by Jérôme Charron <je...@gmail.com>.
> No, I don't think so.  These are strongly related bundles of plugins.
> When you change one chances are good you'll change the others, so it
> makes sense to keep their code together rather than split it up.  Folks
> can still find all implementations of an interface in the javadoc, just
> not always grouped together in the table of contents.

So, we agreed.


> We could instead of calling these "misc" call them "compound plugins" or
> something.  We can change the package.html for each to list the
> coordinated set of plugins they provide.  For example,
> language-identifier's could say something like, "Includes parse, index
> and query plugins to identify, index and make searchable the identified
> language."

I plan to review all the package.html ... I will include those changes.
Thanks!

Jérôme

Re: Refactoring some plugins

Posted by Doug Cutting <cu...@apache.org>.
Jérôme Charron wrote:
> One more question about javadoc (I hope the last one):
> Do you think it makes sense to split the plugins gathered into the "Misc"
> group
> into many plugins (such as index-more / query-more), so that each sub-plugin
> can be dispatched into proper Group.

No, I don't think so.  These are strongly related bundles of plugins. 
When you change one chances are good you'll change the others, so it 
makes sense to keep their code together rather than split it up.  Folks 
can still find all implementations of an interface in the javadoc, just 
not always grouped together in the table of contents.

We could instead of calling these "misc" call them "compound plugins" or 
something.  We can change the package.html for each to list the 
coordinated set of plugins they provide.  For example, 
language-identifier's could say something like, "Includes parse, index 
and query plugins to identify, index and make searchable the identified 
language."

Doug

Re: Refactoring some plugins

Posted by Jérôme Charron <je...@gmail.com>.
> I'm reluctant to move the extension interface away from the parameter
> and return value classes used by that interface.

I'm reluctant too... I asked, in case someone has a magic idea...


>   Could we instead add a
> super-interface that all extension-point interfaces extend?  That way
> all of the extension points would be listed in javadoc as
> implementations of this interface.

+1 ... Committed.

One more question about javadoc (I hope the last one):
Do you think it makes sense to split the plugins gathered into the "Misc"
group
into many plugins (such as index-more / query-more), so that each sub-plugin
can be dispatched into proper Group. Another solution could be to use in
these
plugins different packages for each extension it provides.

For instance, for the language-identifier plugin, we can be split it in the
following plugins:
* language-identifier
* parse-lang
* index-lang
* query-lang

Or simply refactor it into the following packages:
org.apache.nutch.analysis.lang
org.apache.nutch.parse.lang
org.apache.nutch.indexer.lang
org.apache.nutch.searcher.lang

Jérôme

Re: Refactoring some plugins

Posted by Doug Cutting <cu...@apache.org>.
Jérôme Charron wrote:
> Moreover, I would like to suggest some other javadoc "improvements" (?):
> 
> 1. Create a group for abstract plugins (like lib-http or lib-regex-filter)
> named for instance "Plugins API"

+1

> 2. Create a group for extensions points (As far as I remember, one of the
> first problem when you want
> to extend nutch is to found where are the hooks, ie what are the extensions
> points). One more time, since the
> javadoc groups are filtered by packages, each extension point interface must
> be moved to specific package.
> The idea is then to move all the core extensions points to a new package
> (for instance org.apache.nutch.api).

I'm reluctant to move the extension interface away from the parameter 
and return value classes used by that interface.  Could we instead add a 
super-interface that all extension-point interfaces extend?  That way 
all of the extension points would be listed in javadoc as 
implementations of this interface.

> 3. Create many javadoc plugins groups (one for each major kind of plugin :
> Indexing, Parsing, Protocol, Query, UrlFilter and
> Misc for those that cannot be categorized).

+1

Doug

Re: Refactoring some plugins

Posted by Jérôme Charron <je...@gmail.com>.
> I don't think it upside down.  Plugins should not share packages with
> core code, since that would permit them to use package-private APIs.
> Also, re-arranging the code to make the javadoc nice is right, since the
> javadoc is a primary means of describing the code.

Yes, but what I mean is that it is "stange" that it is a documentation issue
that
raise this need for refactoring.

Moreover, I would like to suggest some other javadoc "improvements" (?):

1. Create a group for abstract plugins (like lib-http or lib-regex-filter)
named for instance "Plugins API"
2. Create a group for extensions points (As far as I remember, one of the
first problem when you want
to extend nutch is to found where are the hooks, ie what are the extensions
points). One more time, since the
javadoc groups are filtered by packages, each extension point interface must
be moved to specific package.
The idea is then to move all the core extensions points to a new package
(for instance org.apache.nutch.api).
3. Create many javadoc plugins groups (one for each major kind of plugin :
Indexing, Parsing, Protocol, Query, UrlFilter and
Misc for those that cannot be categorized).

Thanks for your suggestions and comments.

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Re: Refactoring some plugins

Posted by Doug Cutting <cu...@apache.org>.
Jérôme Charron wrote:
> The changes I suggest are:
> 1. Change package of urlfilter-automaton from org.apache.nutch.net to
> org.apache.nutch.urlfilter.automaton
> 2. Change package of urlfilter-prefix from org.apache.nutch.net to
> org.apache.nutch.urlfilter.prefix
> 3. Change package of urlfilter-regex from org.apache.nutch.net to
> org.apache.nutch.urlfilter.regex
> 4. Change package of lib-regex-filter from org.apache.nutch.net to
> org.apache.nutch.urlfilter.regex.api
> 5. Change package of ontology from org.apache.nutch.ontology to
> org.apache.nutch.ontology.jena

+1

> Yes, I know, changing packaging for some javadoc reasons is quite an "upside
> down world" (sorry for this litteral
> translation from french), but it is the only way to do it since javdoc
> groups are filtered by packages.

I don't think it upside down.  Plugins should not share packages with 
core code, since that would permit them to use package-private APIs. 
Also, re-arranging the code to make the javadoc nice is right, since the 
javadoc is a primary means of describing the code.

Doug