You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Jérôme Charron <je...@gmail.com> on 2005/08/27 10:16:29 UTC

Analysis plugins and lucene-analyzers

Hi,

I would like to add some language specific analysis plugins. In this first 
approach, each plugin would be simply a wrapper of the lucene's analyzers.
So each analysis-<lang> plugin need to import 
lucene-analyzers-1.9-rc1-dev.jar in its lib directory. In order to avoid 
adding this jar in many plugins, 
I would like to add the lucene-analyzers-1.9-rc1-dev.jar in the nutch core 
lib.
Any comments? Any objection?

Regards

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Re: Analysis plugins and lucene-analyzers

Posted by Jérôme Charron <je...@gmail.com>.
> 
> I personal don't like the activation mechanism. I prefer to have the
> 'activated' plugins in the plugin folder and to deactivate just
> remove the plugins from the folder. 
> 
That is much easier to handle than to manage the plugins in the
> folder AND setup them in the configuration file.

+1 

> Right. In case a plugin A require a other plugin B that is not
> available the plugin A will not be loaded, as far I remember. :-/

Stefan, here is my first feed back on the plugins dependencies:

1. How to specify some plugin dependencies?
I assume it is using the directive in the plugin xml descriptor:
<requires>
<import plugin="plugin-id"/>
</requires>
specifies that this plugin requires the "plugin-id" code in its classpath at 
runtime. Is it right?
But looking at the plugin framework code, the "requires" element is never 
parsed.
Can you confirm this point please?

2. There is a addDependency(String id) method in the PluginDescriptor class, 
but this method is seems not to be used in Nutch code.

3. Out of scope, but if someone want to contribute:
I added a direct dependency in my plugin build.xml file to the 
lucene-analysis plugin in order to compile it.
But it would be better if this kind of dependency is automatically handled 
in the build-plugin.xml ant file by parsing the plugins xml descriptors.
(If someone want to contribute. I'm not an ant expert, but I assume it could 
be easy by by writing an ant task based on the PluginManifestParser).

Thanks for your comments. 

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Re: Analysis plugins and lucene-analyzers

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi Jérôme,

>>> I do not object against putting lucene-analyzers-1.9-rc1-dev.jar in
>>> nutch core but I would like to give another option. I think it is
>>> possible to create a plugin which contains and exports this library
>>> and make other analysis plugin depend on it.
>>>
>
>  Yes, that is possible and sure.. I like this idea very much.. :)
> Yes, that was an option that I was thinking about.
> Stefan, in such a case, does this plugin must be in activated in  
> the config
> file or it is just dynamicaly loaded because the analysis plugins  
> need it?

It must be activated. The activation mechanism was not planed and was  
later submittedt.
I personal don't like the activation mechanism. I prefer to have the  
'activated' plugins in the plugin folder and to deactivate just  
remove the plugins from the folder.
That is much easier to handle than to manage the plugins in the  
folder AND setup them in the configuration file.

The best way would be to have different kind of distributions...  
(mini, intranet, web).

> If it must be "manually" activated in the conf file, it seems to be
> "dangerous" (it implies that nutch users known the inter-pugins
> dependencies)
Right. In case a plugin A require a other plugin B that is not  
available the plugin A will not be loaded, as far I remember. :-/

Thanks for taking care of this!
Stefan


Re: Analysis plugins and lucene-analyzers

Posted by Jérôme Charron <je...@gmail.com>.
> 
> > I do not object against putting lucene-analyzers-1.9-rc1-dev.jar in
> > nutch core but I would like to give another option. I think it is
> > possible to create a plugin which contains and exports this library
> > and make other analysis plugin depend on it.

 Yes, that is possible and sure.. I like this idea very much.. :)
Yes, that was an option that I was thinking about.
Stefan, in such a case, does this plugin must be in activated in the config 
file or it is just dynamicaly loaded because the analysis plugins need it?
If it must be "manually" activated in the conf file, it seems to be 
"dangerous" (it implies that nutch users known the inter-pugins 
dependencies)
 Thanks for your responses and suggestions.
 Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Re: Analysis plugins and lucene-analyzers

Posted by Stefan Groschupf <sg...@media-style.com>.
Hi,
> I do not object against putting lucene-analyzers-1.9-rc1-dev.jar in  
> nutch core but I would like to give another option. I think it is  
> possible to create a plugin which contains and exports this library  
> and make other analysis plugin depend on it. I am not an expert in  
> it but I think such solution is also possible. But it is just a  
> second idea for you to consider - I do not have a preference for  
> any of these options.

Yes, that is possible and sure.. I like this idea very much.. :)

Stefan 

Re: Analysis plugins and lucene-analyzers

Posted by Piotr Kosiorowski <pk...@gmail.com>.
Hello,
I do not object against putting lucene-analyzers-1.9-rc1-dev.jar in 
nutch core but I would like to give another option. I think it is 
possible to create a plugin which contains and exports this library and 
make other analysis plugin depend on it. I am not an expert in it but I 
think such solution is also possible. But it is just a second idea for 
you to consider - I do not have a preference for any of these options.
Regards
Piotr
Andrzej Bialecki wrote:
> Jérôme Charron wrote:
> 
>> Hi,
>>
>> I would like to add some language specific analysis plugins. In this 
>> first approach, each plugin would be simply a wrapper of the lucene's 
>> analyzers.
>> So each analysis-<lang> plugin need to import 
>> lucene-analyzers-1.9-rc1-dev.jar in its lib directory. In order to 
>> avoid adding this jar in many plugins, I would like to add the 
>> lucene-analyzers-1.9-rc1-dev.jar in the nutch core lib.
>> Any comments? Any objection?
> 
> 
> I'm wondering if you could implement this plugin as a more or less 
> automatic wrapper around any Lucene classes that implement Analyzer, 
> i.e. so that it doesn't require recompiling to change/select the 
> language, or add a non-standard analyzer from the classpath. I think 
> it's possible to do this, but you would have to code a special-case for 
> Snowball analyzers, where the default constructor requires an argument. 
> All of this could be read from the plugin.xml or n utch-default.xml files.
> 
> 



Re: Analysis plugins and lucene-analyzers

Posted by Andrzej Bialecki <ab...@getopt.org>.
Jérôme Charron wrote:
> Hi,
> 
> I would like to add some language specific analysis plugins. In this first 
> approach, each plugin would be simply a wrapper of the lucene's analyzers.
> So each analysis-<lang> plugin need to import 
> lucene-analyzers-1.9-rc1-dev.jar in its lib directory. In order to avoid 
> adding this jar in many plugins, 
> I would like to add the lucene-analyzers-1.9-rc1-dev.jar in the nutch core 
> lib.
> Any comments? Any objection?

I'm wondering if you could implement this plugin as a more or less 
automatic wrapper around any Lucene classes that implement Analyzer, 
i.e. so that it doesn't require recompiling to change/select the 
language, or add a non-standard analyzer from the classpath. I think 
it's possible to do this, but you would have to code a special-case for 
Snowball analyzers, where the default constructor requires an argument. 
All of this could be read from the plugin.xml or n utch-default.xml files.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com